ArcticDB
ArcticDB copied to clipboard
User facing API to analyze library size
NativeVersionStore.version_store has two ways to scan the sizes of a library:
scan_object_sizesscan_object_sizes_by_stream
The former returns a dict from key types to KeySizesInfo objects, which has fields count, compressed_size, and uncompressed_size. The latter returns a dict from symbol names to dicts of the same form returned by scan_object_sizes (see test_symbol_sizes.py for clarifying examples).
The task is to add library tool method(s?) to:
- Get the size on disk (i.e. uncompressed sizes only) of:
- Whole library
- Whole library broken down by key types (what
scan_object_sizesdoes right now) - Whole library broken down by symbol and key types (what
scan_object_sizes_by_streamdoes right now) - One key type
- One symbol
- One symbol broken down by key type
- Symbols matching a regex
- All of the above, including uncompressed sizes
A crash was also observed when running the existing methods against an internal library at Man Group that should also be investigated.
When delayed deletes are concurrent with scan_object_sizes, current behaviour throws an exception:
StorageException: Not found: Composite: o:__write__*0*1714053087144170821:8986477046003934300:0x70@8725724283296188921[tafl_features/1min/AC,.MF]
ArcticDB should just swallow KeyNotFoundException and return the objects that are available to read.