ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

User facing API to analyze library size

Open poodlewars opened this issue 1 year ago • 1 comments
trafficstars

NativeVersionStore.version_store has two ways to scan the sizes of a library:

  • scan_object_sizes
  • scan_object_sizes_by_stream

The former returns a dict from key types to KeySizesInfo objects, which has fields count, compressed_size, and uncompressed_size. The latter returns a dict from symbol names to dicts of the same form returned by scan_object_sizes (see test_symbol_sizes.py for clarifying examples).

The task is to add library tool method(s?) to:

  • Get the size on disk (i.e. uncompressed sizes only) of:
    • Whole library
    • Whole library broken down by key types (what scan_object_sizes does right now)
    • Whole library broken down by symbol and key types (what scan_object_sizes_by_stream does right now)
    • One key type
    • One symbol
    • One symbol broken down by key type
    • Symbols matching a regex
  • All of the above, including uncompressed sizes

A crash was also observed when running the existing methods against an internal library at Man Group that should also be investigated.

poodlewars avatar Mar 04 '24 12:03 poodlewars

When delayed deletes are concurrent with scan_object_sizes, current behaviour throws an exception:

StorageException: Not found: Composite: o:__write__*0*1714053087144170821:8986477046003934300:0x70@8725724283296188921[tafl_features/1min/AC,.MF]

ArcticDB should just swallow KeyNotFoundException and return the objects that are available to read.

muhammadhamzasajjad avatar Jun 25 '24 09:06 muhammadhamzasajjad