pyscript
pyscript copied to clipboard
Improve documentation on how to handle binary and non-binary files (local/remote, up-/download)
Checklist
- [X] I added a descriptive title
- [X] I searched for other issues and couldn't find a duplication
- [X] I already searched in Google and didn't find any good information or help
What is the issue/comment/problem?
There are a few issues around here concerned with file handling (#588, #558, #463, #151 amongst others). It would be nice to have a dedicated section in the docs with the recommended way of doing things for binary and non-binary files. Summed up:
Local
- Load local file to browser (covered here or here)
- Download file from browser to local (two examples here with file picker, but non-binary data only)
Remote
- Load remote file to browser (covered in #588)
- ~Download file from browser from remote~ (that should hopefully be impossible)
Due to the different nature of (non-) binary files (e.g. excel or genereally zip files), it would be very useful to have the differentiation included as else one stumples across missing await's or similar.
I think most of the above points are already described somewhere but I'm missing an example of how to conveniently access the virtual file system in order to download something locally.
Let's consider this:
from pyodide.http import pyfetch
import asyncio
import pandas as pd
import openpyxl
from io import BytesIO
response = await pyfetch(url="/downloads/test.xlsx", method="GET")
bytes_response = await response.bytes()
df = pd.read_excel(BytesIO(bytes_response))
df
That's the (currently) easiest way of loading binary files. If I call df.to_excel("test_output.xlsx") and df.to_csv("test_output.csv") pandas will save the output to the virtual file system.
What's the best way of automatically starting the download from the browser to local when pandas is done saving to the virtual file system or could this even be skipped in some way? Do we need to use some js proxy, js buffer for the hooks or would you simply use some pyodide function for this?
I'm not sure if this issue was discussed last week when I wasn't around. But maybe @antocuni has an opinion on this? Or should I ping Fabio here? Thanks! And thanks @do-me for opening the issue =)
What's the best way of automatically starting the download from the browser to local when pandas is done saving to the virtual file system or could this even be skipped in some way?
AFAIK we don't have a PyScript specific way to do it, so currently the best way is to use pyodide. This stackoverflow answer shows a possible solution: https://stackoverflow.com/questions/64669355/how-to-copy-download-file-created-in-pyodide-in-browser
Yes, we should provide a more straightforward way of doing it. Yes, we should definitely improve the docs :).
Thanks, later I'll look into it. Meanwhile I might have found a different cross-browser solution for downloading blobs, will test later and update here. Once I get this running, I'll document everything and set up minimal examples for every variant.
Just sharing some WIP in case anyone needs it asap. Loading a remote excel file .xlsx, reading as pandas df and downloading as .csv with the file picker solution.
Requires a download HTML button on the page, e.g. <button id="download">Download</button>
from pyodide.http import pyfetch
import asyncio
import pandas as pd
import openpyxl
from io import BytesIO
import sys
from js import alert, document, Object, window
from pyodide import create_proxy, to_js
async def load_df():
response = await pyfetch(url="/downloads/test.xlsx", method="GET")
bytes_response = await response.bytes()
df = pd.read_excel(BytesIO(bytes_response))
content = df.to_csv() # returns string when file name missing
return content
async def file_save(event):
try:
options = {
"startIn": "downloads",
"suggestedName": "test_123456.csv"
}
fileHandle = await window.showSaveFilePicker(Object.fromEntries(to_js(options)))
except Exception as e:
console.log('Exception: ' + str(e))
return
content = await load_df()
file = await fileHandle.createWritable()
await file.write(content)
await file.close()
return
def setup_button():
# Create a Python proxy for the callback function
file_save_proxy = create_proxy(file_save)
# Set the listener to the callback
document.getElementById("download").addEventListener("click", file_save_proxy, False)
setup_button()
I'm working on a) cross-browser functionality as file picker isn't working in Firefox and b) blob (= e.g. xlsx files) downloads.
Not sure if this is super related, but we created a WordPress plugin around Pyscript, and most of the examples work on the site. However, it always throws this error when we try to read a remote csv file in pandas or even just read a remote URL. Is this related to encoding? The weird thing is the other examples, including the matplotlib one works.

Hi @hellozeyu this is not related as your URL is simply wrong. You're trying to read a csv from the GitHub landing page https://github.com/. Insert the real link (raw csv file, not the repo) and it should work. E.g. this one.
Ah sorry, thought this was pandas-related. You cannot work with the urllib or requests package in pyscript but need to use the pyodide alternatives. See this example.
Got it. It works for me. Thanks!
@do-me Do you think the solution linked in this issue fits your use case? https://github.com/pyscript/pyscript/issues/756 Also, I think you already found a solution? Not sure the last time we talked you said you had something almost working? Lemme know if you need help, we can sync =) I think it'd be really cool to have docs on it and we can do it on a style of "how to". Basically just some code snippets that work and a short explanation on why it works that way it does it'd be perfect. Jeff Glass contributed something like this last week: https://docs.pyscript.net/latest/howtos/passing-objects.html The one about output could be much shorter though.
Hi @marimeireles! Thanks for coming back to this issue. I'm not at home this week but I'll have a look at the new docs next week. Looks promising!
I might have found a way for the last missing piece (binary downloads like excel) via octet streams and DOM manipulation. I didn't find the time yet to test properly, but as soon as I succeed, I'll come back here! So technically the issue is not yet 100% solved I'd say.
Great idea for the snippet-style docs - I think that really suits the spirit of pyscript!
Alright! :) I'm around just ping me.
I finally found the time for testing binary downloads from the virtual file system. I wrote a simple function that takes care of everything and saves a pandas excel export to the local file system:
from pyodide.http import pyfetch
import asyncio
import pandas as pd
import openpyxl
from io import BytesIO
import base64
from js import document
def pandas_excel_export(df, filename):
# save to virtual filesystem
df.to_excel(filename + ".xlsx")
# binary xlsx to base64 encoded downloadable string
data = open("test.xlsx", 'rb').read()
base64_encoded = base64.b64encode(data).decode('UTF-8')
octet_string = "data:application/octet-stream;base64,"
download_string = octet_string + base64_encoded
# create new helper DOM element, click (download) and remove
element = document.createElement('a')
element.setAttribute("href",download_string)
element.setAttribute("download",filename + ".xlsx")
element.click()
element.remove()
# import
response = await pyfetch("/downloads/test.xlsx", method="GET")
bytes_response = await response.bytes()
# read from bytes
df = pd.read_excel(BytesIO(bytes_response))
# manipulate
df["d"] = df["a"] + df["b"]
# export
pandas_excel_export(df,"test")
Working example here.
Coming back to the original purpose of this issue, I think we have everything we need to improve the documentation!
What do you think about a dedicted `File Handling` section in the docs under Getting Started? Or would you rather think it belongs more to the How-to section?
I am preparing a dedicated blog post in the spirit of the original issue description (local/remote & import/export & non-binary/binary data) that could serve as a base for further discussion.
I am closing this for the following reasons:
- we now offer a
fetch(...).bytearray()to solve the conversion issue - we have documented how to write, read, upload, download files via latest PyScript
- binary VS non binary is still a matter of
open(..., 'rb')VSopen(..., 'r')so I hope we covered it all