pyreadstat icon indicating copy to clipboard operation
pyreadstat copied to clipboard

Support reading from file-like objects (streaming input)

Open slobodan-ilic opened this issue 3 months ago • 2 comments

Currently, pyreadstat only accepts file paths as input. This requires extracting large (5GB+) files from zip archives to disk before reading them. It would be more efficient to allow passing a file-like object directly (for example, from zipfile.open()), avoiding unnecessary disk I/O.

slobodan-ilic avatar Oct 13 '25 17:10 slobodan-ilic

Great initiative, thanks! Would this then solve #8 and #274 ? In particular, can this be used to read remote files to a file like object and then the new method would work?

ofajardo avatar Oct 14 '25 12:10 ofajardo

I haven't tested it for remote files, but it's quite possible it would solve that. The readstat has support for file handles already, it's just a matter of using it correctly. I added a draft #309 to try and implement this, waiting for our team to try it out internally. If it works for what they need, I'll get on to fix the PR to be a proper unit of work.

slobodan-ilic avatar Oct 14 '25 14:10 slobodan-ilic