ArcticDB
ArcticDB copied to clipboard
df shape in lazy loading or query builder for optimised query
Is your feature request related to a problem? Please describe. I want this simple operation. I have a huge data with timestamp index. I want just the number of rows based tn the date range query. In future may be any type of filter. The issue is I tried even with columns=[] or just a single column. But the size is so huge that even a single year data blew up 128GB RAM.
Describe the solution you'd like Erther it should have a count or shape function which just aggregates values while going over each segment, instead of storing all the index in memory and I had to do len(df)
Describe alternatives you've considered I tried, but still it blew up memory
lazy_df = store_new["library"].read("symbol", columns=[], lazy=True)
print(lazy_df.collect().data.shape[0])
# or
q = QueryBuilder()
q = q.date_range((pd.Timestamp("2023-01-01"), pd.Timestamp("2024-01-01"))).optimise_for_speed()
df = store_new["library"].read("symbol", columns=["ID"], query_builder=q).data
print(len(df))