Tuesday, November 08, 2005
Finding items with attributes in SQL
I have a MySQL database like this:
create table t (item int, attr int);
insert into t values (1, 1), (1,2), (2,1), (2,3), (3,2), (3,3), (4,1), (4,2);
I need to find items with a list of attributes, ie show me all items that have 1 and 2 or 1 and 3 or 1, 2, and 3, etc.
The answer was an inner join:
select t1.item
from t t1
inner join t t2 on t1.item=t2.item AND t2.attr=2
where t1.attr = 1;
So, the first criteria appears in the where clause. Subsequent criteria appear in the inner join condition which it turns out is treated as basically another where clause. The results are pared down by items that appear in both lists only. (Duh, that's an inner join!) Can you tell I'm no good at SQL? :(
There may be a better way to do this (group by?) but I hope this is reasonably optimized.
EDIT: The query does indeed seem nicely optimized when you create a unique index on item and attr:
alter table t add unique (item, attr);
Explain now shows:
So, the index is used in 1, which results in a constant getting plugged in to the second part (and the index is used again). Spiff.
create table t (item int, attr int);
insert into t values (1, 1), (1,2), (2,1), (2,3), (3,2), (3,3), (4,1), (4,2);
I need to find items with a list of attributes, ie show me all items that have 1 and 2 or 1 and 3 or 1, 2, and 3, etc.
The answer was an inner join:
select t1.item
from t t1
inner join t t2 on t1.item=t2.item AND t2.attr=2
where t1.attr = 1;
So, the first criteria appears in the where clause. Subsequent criteria appear in the inner join condition which it turns out is treated as basically another where clause. The results are pared down by items that appear in both lists only. (Duh, that's an inner join!) Can you tell I'm no good at SQL? :(
There may be a better way to do this (group by?) but I hope this is reasonably optimized.
EDIT: The query does indeed seem nicely optimized when you create a unique index on item and attr:
alter table t add unique (item, attr);
Explain now shows:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | t2 | index | item | item | 10 | NULL | 6 | Using where; Using index |
1 | SIMPLE | t1 | ref | item | item | 10 | builds.t2.item,const | 2 | Using where; Using index |
So, the index is used in 1, which results in a constant getting plugged in to the second part (and the index is used again). Spiff.