pyreadstat
pyreadstat copied to clipboard
Chunks from read_file_in_chunks should have increasing index
When using read_file_in_chunks(..., chunksize=C)
, each dataframe chunk has a RangeIndex(start=0, stop=C, step=1)
. With pd.read_sas(..., chunksize=C)
each dataframe chunk has a RangeIndex(start=n*C, stop=(n+1)*C, step=1)
with n
= current chunk number. I think the Pandas implementation makes more sense because it "feels" like you get actual chunks of the whole dataframe.
makes sense. The current way also makes sense. In any case the change could potentially break people's code, so I am not going to implement it for now, I leave it here for the future.