pyreadstat
pyreadstat copied to clipboard
Return buffer instead of writing the file directly
Describe the issue Is it possible to have a Buffer from the file instead of writing it directly? I have a Flask endpoint running and am facing some problems due to disk usage. For now, I'm writing the file on the disk and reading back, but it adds complications to the flow due to disk usage, delays in checking if the file was written, collision on the filenames etc.
Expected behavior
have an exporter like the pyreadstat.write_sav with a flag of returning a buffer only or if needed a new method like pyreadstat.write_sav_buffer.
something like
pyreadstat.write_sav(
df,
f'/tmp/{filename}',
column_labels=column_labels,
variable_value_labels=variable_value_labels,
variable_format=formats if formats is not None else None,
row_compress=row_compress,
compress=compress
)
if os.path.exists(f'/tmp/{filename}'):
try:
binary_file = open(f'/tmp/{filename}', "rb").read()
os.remove(f'/tmp/{filename}')
#...
would be appreciated if could replaced with
buffer = pyreadstat.write_sav(
df,
column_labels=column_labels,
variable_value_labels=variable_value_labels,
variable_format=formats if formats is not None else None,
row_compress=row_compress,
compress=compress
return_buffer = True
)
# OR
buffer = pyreadstat.write_sav_buffer(
df,
column_labels=column_labels,
variable_value_labels=variable_value_labels,
variable_format=formats if formats is not None else None,
row_compress=row_compress,
compress=compress
)
Setup Information: How did you install pyreadstat? pip Platform macOs but running .venv Python Version 3.9 Python Distribution venv Using Virtualenv or condaenv? yup
Unfortunately it is not possible, because the underlying C library (ReadStat) can only write to disk. A pull request to ReadStat would be needed to implement that there, afterwsrds I could implement in Pyreadstat.
You can create an issue in ReadStat asking fir the feature if you like. There is however a PR for buffer based reading that has been there for a few years already without being merged, so I do not think this is coming anytime soon, sorry.
Unfortunately it is not possible, because the underlying C library (ReadStat) can only write to disk. A pull request to ReadStat would be needed to implement that there, afterwsrds I could implement in Pyreadstat.
You can create an issue in ReadStat asking fir the feature if you like. There is however a PR for buffer based reading that has been there for a few years already without being merged, so I do not think this is coming anytime soon, sorry.
I see what you're saying, thanks for the answer!