I was checking a third party server, and I needed to find if there were tables without primary keys. This is important to know, not only because the lack of primary keys affects performance and data accuracy in general, but also because in
row-based replication performance can degrade beyond belief when updating tables without primary keys. Anyway, I did not remember off the bat any method to get this information from a server with thousands of tables, and thus I went to find a solution on my own. My first instinct called for using the
COLUMNS
table from the
INFORMATIOn_SCHEMA
, and so I came up with this query, where I sum the number of columns that are inside either a PRIMARY or UNIQUE key and filter only the ones where such sum is zero (i.e. no primary or unique keys):
select
table_schema,table_name
from
information_schema.columns
group by
table_schema,table_name
having
sum(if(column_key in ('PRI','UNI'), 1,0)) = 0;
This query got the job done, and it was quite quick as well. Then, since I was chatting with
Sheeri Cabral about other matters, I asked her if she could come up with an alternative solution. She suggested a
LEFT JOIN
between the
information_schema.tables
and
information_schema.statistics
, which I translated into this query:
select
t.table_schema, t.table_name
from
information_schema.tables t
left join information_schema. statistics s
on t.table_schema=s.table_schema and t.table_name=s.table_name
and s.non_unique=0
where
s.table_name is null;
This query works on the principle that it removes from the tables list all the ones for which there is no corresponding table in the
statistics
table. This query also works. Using both queries in a relatively empty server did not show any significant difference. But since I knew that I had to use this method on a very busy server, with a lot of tables, I quickly created 1,000 databases, each containing 5 tables, two of which did not have any primary or unique key. Now came the first surprise. The query with
GROUP BY
took about 0.5 seconds, while the one using
LEFT JOIN
used 11 seconds. I was about to congratulate myself for my acumen, when I realized that, in addition to schema and table names, I also needed the table
engine
. For the second query, that is not a problem. Adding the engine to the columns list works OK, and the query runs in 11 seconds like before. The first query, though, can't list the engine. There is no 'engine' in the
COLUMNS
table. So I needed a JOIN. Thus my query became
select
t.table_schema,t.table_name,engine
from
information_schema.tables t
inner join information_schema .columns c
on t.table_schema=c.table_schema and t.table_name=c.table_name
group by
t.table_schema,t.table_name
having
sum(if(column_key in ('PRI','UNI'), 1,0)) =0;
Guess what? This query ran in 17 seconds (!). So much for my instinct! Joins without keys are not efficient in MySQL, and tables in the information schema are no exception. If anyone has a more efficient method of getting a list of tables without primary key (the list including schema name, table name, and engine), I am curious to know.
No comments:
Post a Comment