ArcticDB
ArcticDB copied to clipboard
Read index API
Reference Issues/PRs
relates to: #399
What does this implement or fix?
Add unit tests to showcase the proposed API for reading the index of a symbol.
Proposal
Add new arcticdb.Library
API
def read_index(symbol: str, as_of: Optional[AsOf] = None) -> VersionedItem
symbol
and as_of
have the same meaning as in arcticdb.Library.read
VersionedItem.data
will contain the index retrieved from the symbol. Its type will be one of pandas.RangeIndex
, pandas.DatetimeIndex
, pandas.MultiIndex
(it can be expanded to all supported index types) .
TBD
- Is optional
date_range
/row_range
argument needed? (I would add it only if someone requests it) - Do we support categorical indexes should they be considered in the first iteration of the task.
- Do we need to return
VersionedItem
? Pros: It's consistent witharcticdb.Library.read
. Cons: IfVersionedItem
starts to grow it can start carrying additional data that's not used
Alternatives
Extend arcticdb.Library.read
with additional parameter index_only
.
Pros:
- Less code duplication
- Move all read related stuff to one function
Cons:
-
VersionedItem.data
will be of different type depending on the parameter which might lead to confusion and bugs -
arcticdb.Library.read
might become huge and hard to understand and use
Any other comments?
Checklist
Checklist for code changes...
- [ ] Have you updated the relevant docstrings, documentation and copyright notice?
- [ ] Is this contribution tested against all ArcticDB's features?
- [ ] Do all exceptions introduced raise appropriate error messages?
- [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?
Mostly looks good, few comments/ideas:
- We have code to support string indexes, although I'm not sure how widely used it is. As a minimum, we need to behave sensibly for string indexed data.
- What should the behaviour be for the empty index type?
- We definitely want
date_range
, and for completeness I think we should includerow_range
too. - We do not support categorical indexes, so these do not need to be considered.
- I think
VersionedItem
is the correct return type, with the index in thedata
field. -
NativeVersionStore
already has a method calledread_index
, so we should use something else for this. Maybeget_index
? - I think it is better as a separate method, rather than an argument to
read
. When selecting a subset of columns, you always get the index columns too without specifying them. We could have a special case, wherecolumns=[]
gets just the index, but it isn't intuitive, and would change the return type ofdata
based on a parameter, which is nasty, particularly if the columns list is generated at runtime and might be empty. - We should support a batch version of the call