Mysql optimizer team blog

5/18/2023

Let’s say hypothetically the optimization somehow failed to work again (perhaps because months have gone by and the table has more data now). This was shocking and embarrassing at the same time. We check the slow query logs, and-surprise, surprise-it was the exact query I had optimized. And then the same customer comes back with saying that they just upgraded to the release containing the fix, and everything just became unbearably slow. I have nearly forgotten about this whole thing. I pat myself on the back for a job well done, send a PR, and call it a day. ORDER BY ChannelId, DeleteAt, CreateAt DESCĪnd they come back saying it ran successfully in the expected time. With much trepidation, I asked the customer to try out the result of: SELECT Id FROM Posts WHERE ChannelId = '9tne5g44z7f1zn4z1whebb7jna' So I’d better use the multi-column index.” And that’s exactly what we want! But now MySQL goes: “Aha, now I have to sort by these three columns. Because ChannelId and DeleteAt are equality checks. Since the choice of the index is dictated by the “ORDER BY” clause rather than the “WHERE” clause, what if we can include that decision in the “ORDER BY” clause itself? If we change ORDER BY CreateAt to ORDER BY ChannelId, DeleteAt, CreateAt, the query result remains exactly the same. Now that we understand the problem, how can we coax MySQL into choosing the right index? Well, if MySQL is acting smart, we can outsmart it. Whereas actually, using the multi-column index leads to scanning fewer rows, which gets sorted in practically no time at all. MySQL tries to be smart and decides that although using the CreateAt index might have to scan through more rows, it avoids the sorting at the end. But after some Googling, I had a pretty good theory of what was happening. I wasn’t able to reproduce the problem locally, so it wasn’t feasible to try different variations of the query. But then, why is MySQL choosing the wrong index in the first place? Using USE INDEX was the nuclear button which I didn’t want to use unless there were no other options.

Simply because the query filters by all three columns which leads to scanning a much smaller dataset. It becomes clear why choosing the multi-column index is faster than choosing the one for CreateAt. And then ordering by CreateAt and just getting the first row. We’re filtering the table by three columns: ChannelId, DeleteAt, and CreateAt. SELECT Id FROM Posts WHERE ChannelId = 'x' AND DeleteAt = y AND CreateAt < z Another involving ChannelId, DeleteAt, and CreateAt. Each of them have their individual indices, and there are two additional multi-column indices. Our main columns of interest are CreateAt, DeleteAt, UpdateAt, and ChannelId. But it contains the essential elements for us to understand the problem. This is an abridged version of the Posts table for brevity. KEY `idx_posts_channel_id_delete_at_create_at` (`ChannelId`,`DeleteAt`,`CreateAt`),įULLTEXT KEY `idx_posts_message_txt` (`Message`) KEY `idx_posts_channel_id_update_at` (`ChannelId`,`UpdateAt`), KEY `idx_posts_channel_id` (`ChannelId`), The query was: SELECT Id FROM Posts WHERE ChannelId = '9tne5g44z7f1zn4z1whebb7jna' It started off with a customer noticing that a SQL query was running slowly in their environment. This is the story of an (apparently) smart optimization to a SQL query that backfired spectacularly-and how we finally fixed it. In that case, allow me to regale the uninitiated reader. If you’re a MySQL veteran and have read the title, you already know where this is heading □. Optimizing SQL queries is always fun-except when it isn’t.

0 Comments

BLOG

Mysql optimizer team blog

Leave a Reply.

Author

Archives

Categories