The Cost of Indexes in MySQL
I've always been troubled by the question - how bad is it to add another index to a table? Everyone knows that indexes help speed up queries but slow down data modifications in the table (because the indexes need to be rebuilt). With a large number of indexes, the time spent on their rebuilding during table modifications can be significant. Or not? I got tired of guessing and decided to check for myself.
I want to clarify right away that this test does not claim any accuracy or objectivity; it is merely a simple experiment that first came to my mind.
So, here's what we have:
MySQL. A table with ~40 columns and ~700K rows. There are so many columns firstly because they are needed, and secondly because the table is already old and was initially poorly normalized, making it difficult to restructure now. This is a real project. A function to add an order to the database. The order consists of 1000 items, all of which will be added to the table. Since I didn't know whether the indexes would be rebuilt synchronously or asynchronously, I decided to include measurements of the execution time of subsequent non-trivial selects in the test (if indexes are rebuilt asynchronously, this might be noticeable by the execution time of the selects). The table already had about 20 indexes.
Before each test run, I imported the table from scratch and restarted the MySQL server to clear the cache. After this, I conducted the test (i.e., when the database cache was still empty) and then conducted the test a second time (when part of the database was already cached).
I repeated the procedure 5 times to average the results and reduce the measurement error.
So, the results before adding additional indexes:
First run (empty database cache):
Adding 1000 rows to the table: 3808 ms
Subsequent 3 selects: 46, 9, and 302 ms respectively
Second run (something was already cached):
Adding 1000 rows to the table: 533 ms
Subsequent 3 selects: 1, 8, and 29 ms respectively
After this, I added 7 indexes (1-3 columns per index) to the table and repeated the tests:
First run (empty database cache):
Adding 1000 rows to the table: 4774 ms
Subsequent 3 selects: 37, 8, and 272 ms respectively
Second run (something was already cached):
Adding 1000 rows to the table: 614 ms
Subsequent 3 selects: 1, 7, and 22 ms respectively
So, it is immediately evident that the time to add rows to the table increased. The subsequent selects became slightly faster - I don't know why, they didn't use the new indexes; perhaps this is a measurement error.
The time to add 1000 rows to the table increased in the first test (with a clean database cache) by 25.4%, and in the second test (when the DBMS had already cached something) - by 15.2%. These figures are quite noticeable, although 7 added indexes are also a significant addition. If the obtained figures are divided by the number of indexes, one could say (very roughly) that adding an index to a table leads to a slowdown in data addition to the table by about 3%.
Then I thought that the database initially had about 20 indexes, and it would be more correct to look at the speed of adding records without any indexes at all. No problem, I dropped all indexes except the PRIMARY and 3 indexes necessary for FK:
First run (empty database cache):
Adding 1000 rows to the table: 1396 ms
Second run (something was already cached):
Adding 1000 rows to the table: 391 ms
I don't provide selects because without indexes they became tens to hundreds of times slower, nothing surprising.
Using the results of the second test (it better reflects the real work of the DBMS when data is cached), one can calculate that adding 1 index to the table leads to a decrease in insert speed by an average of 2.1%.
Again: I absolutely do not claim the correctness and accuracy of my tests, but this is better than nothing, better than just guessing.
Draw your own conclusions.
P.S. If anyone wonders why I didn't test update and delete, I'll answer - just too lazy, I already spent 1.5 hours on this testing :)
Comments
Post a Comment