aim icon indicating copy to clipboard operation
aim copied to clipboard

Allow sampling of sequence in any given range

Open alberttorosyan opened this issue 3 years ago • 0 comments

🚀 Feature

As of version 3.11, Aim sequential data can be stored in two different ways:

  1. Linear. where tracking step is used as a key/index for the value.
  2. Hashed. random deterministic hashing is applied to the tracking step and resulting value is used as a key for step and value

Second option allows to select sample of K elements from sequence of any length by just reading first K keys from sequence. This is good in terms of performance, however it has it's limitations since this approach cannot be applied to the range selections. Hence this method is applied to Metric sequences only, where we don't have range selection yet.

Need to implement a new algorithm of mapping sequence data to the storage which will allow effectively select K samples in a given steps range [start, stop] from a sequence of N elements.

Motivation

Improve performance of object sequence queries by implementing sampling on sequence sub-ranges which is memory efficient and performs well on long sequences (millions of steps) Enable range selection on metric sequences (for zoom-in functionality)

Additional context

Source pointers:

  • Implementation of Sequence class (base for metric and object sequences): https://github.com/aimhubio/aim/blob/ee6d0f1463c06b11dd5256dc6e1c9ba35400a274/aim/sdk/sequence.py#L200
  • Implementation of sequence tracking method: https://github.com/aimhubio/aim/blob/ee6d0f1463c06b11dd5256dc6e1c9ba35400a274/aim/sdk/tracker.py#L47

alberttorosyan avatar Jun 23 '22 17:06 alberttorosyan