pyreadstat icon indicating copy to clipboard operation
pyreadstat copied to clipboard

Chunks from read_file_in_chunks should have increasing index

Open jonashaag opened this issue 2 years ago • 1 comments

When using read_file_in_chunks(..., chunksize=C), each dataframe chunk has a RangeIndex(start=0, stop=C, step=1). With pd.read_sas(..., chunksize=C) each dataframe chunk has a RangeIndex(start=n*C, stop=(n+1)*C, step=1) with n = current chunk number. I think the Pandas implementation makes more sense because it "feels" like you get actual chunks of the whole dataframe.

jonashaag avatar May 30 '22 10:05 jonashaag

makes sense. The current way also makes sense. In any case the change could potentially break people's code, so I am not going to implement it for now, I leave it here for the future.

ofajardo avatar May 30 '22 10:05 ofajardo