ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

df shape in lazy loading or query builder for optimised query

Open himsheda opened this issue 2 months ago • 1 comments

Is your feature request related to a problem? Please describe. I want this simple operation. I have a huge data with timestamp index. I want just the number of rows based tn the date range query. In future may be any type of filter. The issue is I tried even with columns=[] or just a single column. But the size is so huge that even a single year data blew up 128GB RAM.

Describe the solution you'd like Erther it should have a count or shape function which just aggregates values while going over each segment, instead of storing all the index in memory and I had to do len(df)

Describe alternatives you've considered I tried, but still it blew up memory

lazy_df = store_new["library"].read("symbol", columns=[], lazy=True)
print(lazy_df.collect().data.shape[0])
# or
q = QueryBuilder()
q = q.date_range((pd.Timestamp("2023-01-01"), pd.Timestamp("2024-01-01"))).optimise_for_speed()
df = store_new["library"].read("symbol", columns=["ID"], query_builder=q).data
print(len(df))

himsheda avatar Sep 10 '25 07:09 himsheda