ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Read index API

Open vasil-pashov opened this issue 9 months ago • 1 comments

Reference Issues/PRs

relates to: #399

What does this implement or fix?

Add unit tests to showcase the proposed API for reading the index of a symbol.

Proposal Add new arcticdb.Library API

def read_index(symbol: str, as_of: Optional[AsOf] = None) -> VersionedItem

symbol and as_of have the same meaning as in arcticdb.Library.read

VersionedItem.data will contain the index retrieved from the symbol. Its type will be one of pandas.RangeIndex, pandas.DatetimeIndex, pandas.MultiIndex (it can be expanded to all supported index types) .

TBD

  • Is optional date_range/row_range argument needed? (I would add it only if someone requests it)
  • Do we support categorical indexes should they be considered in the first iteration of the task.
  • Do we need to return VersionedItem? Pros: It's consistent with arcticdb.Library.read. Cons: If VersionedItem starts to grow it can start carrying additional data that's not used

Alternatives Extend arcticdb.Library.read with additional parameter index_only. Pros:

  • Less code duplication
  • Move all read related stuff to one function

Cons:

  • VersionedItem.data will be of different type depending on the parameter which might lead to confusion and bugs
  • arcticdb.Library.read might become huge and hard to understand and use

Any other comments?

Checklist

Checklist for code changes...
  • [ ] Have you updated the relevant docstrings, documentation and copyright notice?
  • [ ] Is this contribution tested against all ArcticDB's features?
  • [ ] Do all exceptions introduced raise appropriate error messages?
  • [ ] Are API changes highlighted in the PR description?
  • [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

vasil-pashov avatar May 10 '24 16:05 vasil-pashov

Mostly looks good, few comments/ideas:

  • We have code to support string indexes, although I'm not sure how widely used it is. As a minimum, we need to behave sensibly for string indexed data.
  • What should the behaviour be for the empty index type?
  • We definitely want date_range, and for completeness I think we should include row_range too.
  • We do not support categorical indexes, so these do not need to be considered.
  • I think VersionedItem is the correct return type, with the index in the data field.
  • NativeVersionStore already has a method called read_index, so we should use something else for this. Maybe get_index?
  • I think it is better as a separate method, rather than an argument to read. When selecting a subset of columns, you always get the index columns too without specifying them. We could have a special case, where columns=[] gets just the index, but it isn't intuitive, and would change the return type of data based on a parameter, which is nasty, particularly if the columns list is generated at runtime and might be empty.
  • We should support a batch version of the call

alexowens90 avatar May 13 '24 09:05 alexowens90