arctic icon indicating copy to clipboard operation
arctic copied to clipboard

prerequisites to use DataRange to read a subset of data?

Open qiuwei opened this issue 5 years ago • 1 comments

Arctic Version

1.79.3

Arctic Store

Version Store

Platform and version

macOS Catalina 10.15.6 (19G73)

Description of problem and/or code sample that reproduces the issue

What are the prerequisites to use DataRange to read a subset of data? From the documentation, it clear that the index must have a datetime index present(Multiindex is supported). After some tryout and reading of the source code, I think there are other requirements, such as:

  1. The Datetime index should be sorted. ~~2. The start and end of DateRange should be present in the datetime index or None.~~ This is not true. Key error only occurs when the Datetime index is not sorted.

Could you confirm that my understandings are correct?

qiuwei avatar Oct 10 '20 12:10 qiuwei

Hi @qiuwei Confirmed that is the case - with Version store, if you store an unsorted dataframe, the DataRange subset doesn't work when reading out the data. Have you considered using ChunkStore instead ? It will sort your data when calling ChunkStore.write, therefore the chunk_range should always find the correct subset of the data

from arctic import CHUNK_STORE, Arctic
dev = Arctic(mongo_host='localhost')
dev.initialize_library('chunkstore', lib_type=CHUNK_STORE)
lib = dev['chunkstore']
df = pd.DataFrame({'date': [pd.Timestamp('20220131'), pd.Timestamp('20220120')], 'values': [1,2]}).set_index(['date'])
lib.write('test_df', df)

lib.read('test_df')
Out[56]:
            values
date
2022-01-20       2
2022-01-31       1

jasonlocal avatar Jan 30 '22 18:01 jasonlocal