ArcticDB
ArcticDB copied to clipboard
Read index API
trafficstars
Reference Issues/PRs
relates to: #399
What does this implement or fix?
Add unit tests to showcase the proposed API for reading the index of a symbol.
Proposal
Add new arcticdb.Library API
def read_index(symbol: str, as_of: Optional[AsOf] = None) -> VersionedItem
symbol and as_of have the same meaning as in arcticdb.Library.read
VersionedItem.data will contain the index retrieved from the symbol. Its type will be one of pandas.RangeIndex, pandas.DatetimeIndex, pandas.MultiIndex (it can be expanded to all supported index types) .
TBD
- Is optional
date_range/row_rangeargument needed? (I would add it only if someone requests it) - Do we support categorical indexes should they be considered in the first iteration of the task.
- Do we need to return
VersionedItem? Pros: It's consistent witharcticdb.Library.read. Cons: IfVersionedItemstarts to grow it can start carrying additional data that's not used
Alternatives
Extend arcticdb.Library.read with additional parameter index_only.
Pros:
- Less code duplication
- Move all read related stuff to one function
Cons:
VersionedItem.datawill be of different type depending on the parameter which might lead to confusion and bugsarcticdb.Library.readmight become huge and hard to understand and use
Any other comments?
Checklist
Checklist for code changes...
- [ ] Have you updated the relevant docstrings, documentation and copyright notice?
- [ ] Is this contribution tested against all ArcticDB's features?
- [ ] Do all exceptions introduced raise appropriate error messages?
- [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?
Mostly looks good, few comments/ideas:
- We have code to support string indexes, although I'm not sure how widely used it is. As a minimum, we need to behave sensibly for string indexed data.
- What should the behaviour be for the empty index type?
- We definitely want
date_range, and for completeness I think we should includerow_rangetoo. - We do not support categorical indexes, so these do not need to be considered.
- I think
VersionedItemis the correct return type, with the index in thedatafield. NativeVersionStorealready has a method calledread_index, so we should use something else for this. Maybeget_index?- I think it is better as a separate method, rather than an argument to
read. When selecting a subset of columns, you always get the index columns too without specifying them. We could have a special case, wherecolumns=[]gets just the index, but it isn't intuitive, and would change the return type ofdatabased on a parameter, which is nasty, particularly if the columns list is generated at runtime and might be empty. - We should support a batch version of the call