pyreadstat icon indicating copy to clipboard operation
pyreadstat copied to clipboard

Return buffer instead of writing the file directly

Open hfabio opened this issue 1 year ago • 2 comments

Describe the issue Is it possible to have a Buffer from the file instead of writing it directly? I have a Flask endpoint running and am facing some problems due to disk usage. For now, I'm writing the file on the disk and reading back, but it adds complications to the flow due to disk usage, delays in checking if the file was written, collision on the filenames etc.

Expected behavior have an exporter like the pyreadstat.write_sav with a flag of returning a buffer only or if needed a new method like pyreadstat.write_sav_buffer.

something like

pyreadstat.write_sav(
        df,
        f'/tmp/{filename}',
        column_labels=column_labels,
        variable_value_labels=variable_value_labels,
        variable_format=formats if formats is not None else None,
        row_compress=row_compress,
        compress=compress
      )
if os.path.exists(f'/tmp/{filename}'):
    try:
      binary_file = open(f'/tmp/{filename}', "rb").read()
      os.remove(f'/tmp/{filename}')
      #...

would be appreciated if could replaced with

buffer = pyreadstat.write_sav(
        df,
        column_labels=column_labels,
        variable_value_labels=variable_value_labels,
        variable_format=formats if formats is not None else None,
        row_compress=row_compress,
        compress=compress
        return_buffer = True
      )
# OR
buffer = pyreadstat.write_sav_buffer(
        df,
        column_labels=column_labels,
        variable_value_labels=variable_value_labels,
        variable_format=formats if formats is not None else None,
        row_compress=row_compress,
        compress=compress
      )

Setup Information: How did you install pyreadstat? pip Platform macOs but running .venv Python Version 3.9 Python Distribution venv Using Virtualenv or condaenv? yup

hfabio avatar Apr 01 '24 21:04 hfabio

Unfortunately it is not possible, because the underlying C library (ReadStat) can only write to disk. A pull request to ReadStat would be needed to implement that there, afterwsrds I could implement in Pyreadstat.

You can create an issue in ReadStat asking fir the feature if you like. There is however a PR for buffer based reading that has been there for a few years already without being merged, so I do not think this is coming anytime soon, sorry.

ofajardo avatar Apr 16 '24 18:04 ofajardo

Unfortunately it is not possible, because the underlying C library (ReadStat) can only write to disk. A pull request to ReadStat would be needed to implement that there, afterwsrds I could implement in Pyreadstat.

You can create an issue in ReadStat asking fir the feature if you like. There is however a PR for buffer based reading that has been there for a few years already without being merged, so I do not think this is coming anytime soon, sorry.

I see what you're saying, thanks for the answer!

hfabio avatar Apr 17 '24 00:04 hfabio