Dolt doesn't use indexed lookup on top N query. Very slow query resulting in DoltHub timeouts

Open coffeegoddd opened this issue 2 years ago • 1 comments

Dolt v1.3.0 on Hosted Dolt.

DoltHub uses a table called password_attempts with the following definition:

| password_attempts | CREATE TABLE `password_attempts` (
  `id` varchar(36) NOT NULL,
  `user_id_fk` varchar(36) NOT NULL,
  `event_type` int,
  `ip_address` varchar(255),
  `user_agent` varchar(3096),
  `created_at` timestamp(6),
  PRIMARY KEY (`id`),
  KEY `password_attempts_user_id_fk` (`user_id_fk`,`created_at`),
  KEY `user_id_fk` (`user_id_fk`),
  CONSTRAINT `1jjafe08` FOREIGN KEY (`user_id_fk`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |

This table currently has ~480k rows.

This query takes > 4 seconds to complete in Dolt:

SELECT * FROM password_attempts WHERE (user_id_fk = 'some-id') ORDER BY created_at DESC LIMIT 3;

This query take < 1ms in Postgres (a replica of the Hosted Dolt instance).

Here is the plan output:

mysql> EXPLAIN SELECT * FROM password_attempts WHERE (user_id_fk = '60deee19-e995-4250-b3f8-f9eb4308b909') ORDER BY created_at DESC LIMIT 3;
+---------------------------------------------------------------------------------------------------------------+
| plan                                                                                                          |
+---------------------------------------------------------------------------------------------------------------+
| Limit(3)                                                                                                      |
|  ââ TopN(Limit: [3]; password_attempts.created_at DESC)                                                   |
|      ââ Filter                                                                                            |
|          ââ (password_attempts.user_id_fk = 'some-id')                       |
|          ââ IndexedTableAccess(password_attempts)                                                         |
|              ââ index: [password_attempts.user_id_fk]                                                     |
|              ââ filters: [{[some-id, some-id]}] |
|              ââ columns: [id user_id_fk event_type ip_address user_agent created_at]                      |
+---------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

Jun 26 '23 20:06 coffeegoddd

This is still valid, we should be opportunistically using the second index to eliminate the SORT. @jycor this is related to some prior SORT elimination, and I think this should be low effort because the match case is so specific, functional dependency for the second index is hasMax1Row and will be sorted by the created_at field .

Apr 16 '24 00:04 max-hoffman