`analyze table` returns ok but does not generate stats
Potentially related to: https://github.com/dolthub/dolt/issues/8324
$ du -h
4.0K ./.doltcfg
479G ./.dolt/noms/oldgen
479G ./.dolt/noms
0B ./.dolt/temptf
0B ./.dolt/stats/.dolt/noms/oldgen
1.0M ./.dolt/stats/.dolt/noms
0B ./.dolt/stats/.dolt/temptf
1.0M ./.dolt/stats/.dolt
1.0M ./.dolt/stats
479G ./.dolt
654G .
$ dolt sql -q "analyze table actor, revision, comment"
+----------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------+---------+----------+----------+
| actor | analyze | status | OK |
| revision | analyze | status | OK |
| comment | analyze | status | OK |
+----------+---------+----------+----------+
$ du -h
4.0K ./.doltcfg
479G ./.dolt/noms/oldgen
479G ./.dolt/noms
0B ./.dolt/temptf
0B ./.dolt/stats/.dolt/noms/oldgen
1.0M ./.dolt/stats/.dolt/noms
0B ./.dolt/stats/.dolt/temptf
1.0M ./.dolt/stats/.dolt
1.0M ./.dolt/stats
479G ./.dolt
654G .
$ dolt sql -q "select count(*) from dolt_statistics;"
+----------+
| count(*) |
+----------+
| 0 |
+----------+
These are large tables:
$ dolt sql -q "select count(*) from actor"
+----------+
| count(*) |
+----------+
| 290054 |
+----------+
$ dolt sql -q "select count(*) from revision"
+----------+
| count(*) |
+----------+
| 4013041 |
+----------+
$ dolt sql -q "select count(*) from comment"
+----------+
| count(*) |
+----------+
| 1409589 |
+----------+
Seems like something should have gotten written to disk.
Note, when stats were generated using call dolt_stats_restart() the query I was trying to change the plan for that used these three tables started to execute fast.
Here's the query:
$ dolt sql -q " SELECT rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,actor_rev_user.actor_user AS \`rev_user\`,actor_rev_user.actor_name AS \`rev_user_text\`,rev_actor,comment_rev_comment.comment_text AS \`rev_comment_text\`,comment_rev_comment.comment_data AS \`rev_comment_data\`,comment_rev_comment.comment_id AS \`rev_comment_cid\` FROM \`revision\` JOIN \`actor\` \`actor_rev_user\` ON ((actor_rev_user.actor_id = rev_actor)) JOIN \`comment\` \`comment_rev_comment\` ON ((comment_rev_comment.comment_id = rev_comment_id)) WHERE rev_page = 21990 AND (rev_timestamp <= '20040219154224') ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 1"
Here is the old plan:
$ dolt sql -q "explain SELECT rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,actor_rev_user.actor_user AS \`rev_user\`,actor_rev_user.actor_name AS \`rev_user_text\`,rev_actor,comment_rev_comment.comment_text AS \`rev_comment_text\`,comment_rev_comment.comment_data AS \`rev_comment_data\`,comment_rev_comment.comment_id AS \`rev_comment_cid\` FROM \`revision\` JOIN \`actor\` \`actor_rev_user\` ON ((actor_rev_user.actor_id = rev_actor)) JOIN \`comment\` \`comment_rev_comment\` ON ((comment_rev_comment.comment_id = rev_comment_id)) WHERE rev_page = 21990 AND (rev_timestamp <= '20040219154224') ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 1"
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Limit(1) |
| └─ Project |
| ├─ columns: [revision.rev_id, revision.rev_page, revision.rev_timestamp, revision.rev_minor_edit, revision.rev_deleted, revision.rev_len, revision.rev_parent_id, revision.rev_sha1, actor_rev_user.actor_user as rev_user, actor_rev_user.actor_name as rev_user_text, revision.rev_actor, comment_rev_comment.comment_text as rev_comment_text, comment_rev_comment.comment_data as rev_comment_data, comment_rev_comment.comment_id as rev_comment_cid] |
| └─ Sort(revision.rev_timestamp DESC, revision.rev_id DESC) |
| └─ Project |
| ├─ columns: [revision.rev_id, revision.rev_page, revision.rev_comment_id, revision.rev_actor, revision.rev_timestamp, revision.rev_minor_edit, revision.rev_deleted, revision.rev_len, revision.rev_parent_id, revision.rev_sha1, actor_rev_user.actor_id, actor_rev_user.actor_user, actor_rev_user.actor_name, comment_rev_comment.comment_id, comment_rev_comment.comment_hash, comment_rev_comment.comment_text, comment_rev_comment.comment_data, actor_rev_user.actor_user as rev_user, actor_rev_user.actor_name as rev_user_text, comment_rev_comment.comment_text as rev_comment_text, comment_rev_comment.comment_data as rev_comment_data, comment_rev_comment.comment_id as rev_comment_cid] |
| └─ HashJoin |
| ├─ (actor_rev_user.actor_id = revision.rev_actor) |
| ├─ LookupJoin |
| │ ├─ (comment_rev_comment.comment_id = revision.rev_comment_id) |
| │ ├─ TableAlias(comment_rev_comment) |
| │ │ └─ Table |
| │ │ ├─ name: comment |
| │ │ └─ columns: [comment_id comment_hash comment_text comment_data] |
| │ └─ Filter |
| │ ├─ ((revision.rev_page = 21990) AND (revision.rev_timestamp <= '20040219154224')) |
| │ └─ IndexedTableAccess(revision) |
| │ ├─ index: [revision.rev_page,revision.rev_timestamp] |
| │ ├─ columns: [rev_id rev_page rev_comment_id rev_actor rev_timestamp rev_minor_edit rev_deleted rev_len rev_parent_id rev_sha1] |
| │ └─ keys: 21990 |
| └─ HashLookup |
| ├─ left-key: (revision.rev_actor) |
| ├─ right-key: (actor_rev_user.actor_id) |
| └─ TableAlias(actor_rev_user) |
| └─ Table |
| ├─ name: actor |
| └─ columns: [actor_id actor_user actor_name] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
And here is the improved plan after stats were collected:
$ dolt sql -q "explain SELECT rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,actor_rev_user.actor_user AS \`rev_user\`,actor_rev_user.actor_name AS \`rev_user_text\`,rev_actor,comment_rev_comment.comment_text AS \`rev_comment_text\`,comment_rev_comment.comment_data AS \`rev_comment_data\`,comment_rev_comment.comment_id AS \`rev_comment_cid\` FROM \`revision\` JOIN \`actor\` \`actor_rev_user\` ON ((actor_rev_user.actor_id = rev_actor)) JOIN \`comment\` \`comment_rev_comment\` ON ((comment_rev_comment.comment_id = rev_comment_id)) WHERE rev_page = 21990 AND (rev_timestamp <= '20040219154224') ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 1"
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Limit(1) |
| └─ Project |
| ├─ columns: [revision.rev_id, revision.rev_page, revision.rev_timestamp, revision.rev_minor_edit, revision.rev_deleted, revision.rev_len, revision.rev_parent_id, revision.rev_sha1, actor_rev_user.actor_user as rev_user, actor_rev_user.actor_name as rev_user_text, revision.rev_actor, comment_rev_comment.comment_text as rev_comment_text, comment_rev_comment.comment_data as rev_comment_data, comment_rev_comment.comment_id as rev_comment_cid] |
| └─ Sort(revision.rev_timestamp DESC, revision.rev_id DESC) |
| └─ Project |
| ├─ columns: [revision.rev_id, revision.rev_page, revision.rev_comment_id, revision.rev_actor, revision.rev_timestamp, revision.rev_minor_edit, revision.rev_deleted, revision.rev_len, revision.rev_parent_id, revision.rev_sha1, actor_rev_user.actor_id, actor_rev_user.actor_user, actor_rev_user.actor_name, comment_rev_comment.comment_id, comment_rev_comment.comment_hash, comment_rev_comment.comment_text, comment_rev_comment.comment_data, actor_rev_user.actor_user as rev_user, actor_rev_user.actor_name as rev_user_text, comment_rev_comment.comment_text as rev_comment_text, comment_rev_comment.comment_data as rev_comment_data, comment_rev_comment.comment_id as rev_comment_cid] |
| └─ LookupJoin |
| ├─ LookupJoin |
| │ ├─ IndexedTableAccess(revision) |
| │ │ ├─ index: [revision.rev_page,revision.rev_actor,revision.rev_timestamp] |
| │ │ ├─ filters: [{[21990, 21990], [NULL, ∞), (NULL, [50 48 48 52 48 50 49 57 49 53 52 50 50 52]]}] |
| │ │ └─ columns: [rev_id rev_page rev_comment_id rev_actor rev_timestamp rev_minor_edit rev_deleted rev_len rev_parent_id rev_sha1] |
| │ └─ TableAlias(actor_rev_user) |
| │ └─ IndexedTableAccess(actor) |
| │ ├─ index: [actor.actor_id] |
| │ ├─ columns: [actor_id actor_user actor_name] |
| │ └─ keys: revision.rev_actor |
| └─ TableAlias(comment_rev_comment) |
| └─ IndexedTableAccess(comment) |
| ├─ index: [comment.comment_id] |
| ├─ columns: [comment_id comment_hash comment_text comment_data] |
| └─ keys: revision.rev_comment_id
I have not been able to repro (1) with/without fixing dolt_stats_restart() db location, (2) while a server is/is not running. I did find and fixed some dropped errors on the read/write paths.
If we catch this happening again the thing we want to do is zip and save .dolt/stats/.dolt/noms. If a write fails silently we will notice the journal missing data. Otherwise it's some in-process specific tracking issue, like branch/database name/database location. All three databases returning OK but no stats makes me think there was some lock conflict that rejected the write, like a LOCK file left behind or a disk permissions issue.