Allow sampling of sequence in any given range
🚀 Feature
As of version 3.11, Aim sequential data can be stored in two different ways:
- Linear. where tracking step is used as a key/index for the value.
- Hashed. random deterministic hashing is applied to the tracking step and resulting value is used as a key for step and value
Second option allows to select sample of K elements from sequence of any length by just reading first K keys from sequence. This is good in terms of performance, however it has it's limitations since this approach cannot be applied to the range selections. Hence this method is applied to Metric sequences only, where we don't have range selection yet.
Need to implement a new algorithm of mapping sequence data to the storage which will allow effectively select K samples in a given steps range [start, stop] from a sequence of N elements.
Motivation
Improve performance of object sequence queries by implementing sampling on sequence sub-ranges which is memory efficient and performs well on long sequences (millions of steps) Enable range selection on metric sequences (for zoom-in functionality)
Additional context
Source pointers:
- Implementation of
Sequenceclass (base for metric and object sequences): https://github.com/aimhubio/aim/blob/ee6d0f1463c06b11dd5256dc6e1c9ba35400a274/aim/sdk/sequence.py#L200 - Implementation of sequence tracking method: https://github.com/aimhubio/aim/blob/ee6d0f1463c06b11dd5256dc6e1c9ba35400a274/aim/sdk/tracker.py#L47