redb icon indicating copy to clipboard operation
redb copied to clipboard

Include table stats in `DatabaseStats`

Open casey opened this issue 2 years ago • 7 comments

I'm auditing space usage in the ord index, and having information about individual tables would be super useful. In particular, the total size of each table in bytes, and how many keys it holds.

casey avatar Oct 13 '22 22:10 casey

Also, perhaps DatabaseStats could also check for space leaks, i.e. check that the number of pages reachable from current tables is the number of allocated pages, and include a statistic that had that information. The ordinals index is now a whopping 117G (last I checked it was 50G), so I'm curious if there's some kind of leak there.

casey avatar Oct 14 '22 00:10 casey

Would a separate method on Table to retrieve just that table's stats work for you? I'd rather not add it to DatabaseStats, because then it's size in memory would be unbounded.

Ya, a leak check makes sense. I've been planning to add a fsck() method that would also verify checksum integrity.

cberner avatar Oct 15 '22 00:10 cberner

Would a separate method on Table to retrieve just that table's stats work for you? I'd rather not add it to DatabaseStats, because then it's size in memory would be unbounded.

Sort of! The issue is then, I don't have a way of getting information on all tables in the database. I know what tables I think are in the database, and I can open them one by one and get stats about them, but I also want to see if there are any tables in the database I don't know about, and get information on those.

ReadTransaction::list_tables exists, but it returns strings, which can't be used to open tables. So I could see if there were other tables in the database that I didn't have definitions for, but I couldn't get definitions for them to open them and get the stats.

Ya, a leak check makes sense. I've been planning to add a fsck() method that would also verify checksum integrity.

That would be rad. A fsck that reported any problems/leaks it encountered would be sweet.

casey avatar Oct 15 '22 01:10 casey

Hmm, ya I see. How would you end up with a table that you didn't know about? It's not exactly pretty, but to find out the type of a table you can just open it with two random types and you'll get an error message telling you the types: https://github.com/cberner/redb/blob/754e3616827404283305cbdbb3154be6071f12bf/src/tree_store/table_tree.rs#L344

cberner avatar Oct 15 '22 02:10 cberner

I think it could actually happen pretty easily. For example, we release a version which changes the name of a table, but forget to remove the table with the old name from the database.

One idea is to have a separate Database::table_stats function which either returns Vec<TableStats>, or a Iterator<Item = TableStats>.

casey avatar Oct 15 '22 03:10 casey

Hmm, ok lemme think this over. Perhaps I can change list_tables to return structs that have the name, and also allow you to fetch the stats.

Alternately, what if I change delete_table to take a &str instead of needing a TableDefinition? That way you could list all the tables, diff against the expected tables, and delete any unexpected ones without needing to know their type

cberner avatar Oct 16 '22 18:10 cberner

Hmm, ok lemme think this over. Perhaps I can change list_tables to return structs that have the name, and also allow you to fetch the stats.

That sounds good.

Alternately, what if I change delete_table to take a &str instead of needing a TableDefinition? That way you could list all the tables, diff against the expected tables, and delete any unexpected ones without needing to know their type

That would be useful for cleaning up unexpected tables, but if we have unexpected tables, I'd also like to have stats (type, size, etc) to try figure out what they are and what they contain, instead of just blindly deleting them.

casey avatar Oct 17 '22 00:10 casey

@casey I'm finally getting around to this issue. Do you still need this API? I fixed the second issue (https://github.com/cberner/redb/pull/542), and I could add an API like open_table_untyped() which takes a impl TableHandle. That would return an UntypedTable instead of a Table which would only let you call APIs (like .stats()) that don't need to know the K & V types.

cberner avatar Mar 27 '23 23:03 cberner

Documenting my proposed API, but am going to close this for now.

struct UntypedTable;

impl UntypedTable {
    pub fn stats(&self) -> Result<TableStats> {
      ... 
   }
}

impl ReadOnlyTransaction {
   pub fn open_table_untyped(&self, handle: impl TableHandle) -> Result<UntypedTable> {
      ...
   }
}

cberner avatar Apr 08 '23 20:04 cberner

Sorry for not following up on this! Yah, this would be great. I just updated ord index info command to return table info, and it's pretty heinous.

Also, the TableStats type isn't showing up in the docs. For example, here the TableStats type is showing up in grey. Kind of odd. It think maybe the type is pub but it isn't exported from the crate.

casey avatar Nov 21 '23 23:11 casey

Np! Can you give the linked PR a try?

cberner avatar Nov 24 '23 23:11 cberner