pyodide
pyodide copied to clipboard
Discussion / WIP: Add a very minimal requests shim and its dependencies
Description
Currently the Pyodide project can install Python requests
, because it is a pure Python library. It will not work however, since it depends on the http
library. I think it could be nice to support / maintain a version of requests
that proxies any input it gets to javascript XMLHttpRequest
, using the js
module provided by pyodide itself. I have made a very basic and minimal version of this "requests
shim" by patching the Session.request
method and the models.Response
class.
This can already do some basic things well enough, and works like a charm for my current project, where I don't need everything requests
offers. I also think a pyodide version of requests
shouldn't necessarily produce the exact same results as CPython requests
. The browser already knows quite well which headers and cookies to send for example. Where possible, it's probably best to rely on the browser's opinion about what an acceptable HTTP request should contain. On top of that, some parts of requests
are probably impossible to support at all, like the verify=False
option for disabling SSL checks.
To see the changes I made in pyodide-requests-shim.patch
presented nicely, check the diff at this PR: https://github.com/bartbroere/requests/pull/1
This is definitely still a work-in-progress PR, but I hope it can be a discussion starter: Could a patch of requests
that proxies calls to Javascript become a part of the pyodide project?
Checklists
Not all of these steps are necessary for every PR.
- [ ] Add a CHANGELOG entry or explain why an entry is not needed
- [ ] Add / update tests
- [ ] Add new / update outdated documentation
Thanks for working on this @bartbroere ! This goes toward https://github.com/pyodide/pyodide/issues/140
Could a patch of requests that proxies calls to Javascript become a part of the pyodide project?
We clearly need to have better support for the file downloads in Python. This likely means that we would need to patch something, but the question is at what level it should happen. Ideally patching http.client (@hoodmane did some work in https://github.com/pyodide/pyodide/issues/140#issuecomment-897888308) could mean that urllib3 and requests would work without patching. But in practice I'm not convinced that this would really be the case, and maybe we would need to patch requests as you are doing here in any case.
In any case experimenting with this is very useful.
Thanks in advance @bartbroere! I am also interested in patching requests.
Patching http.client
would be ideal, but I don't think it's easy to emulate socket behavior fully in javascript.
So for a starter, patching the most used high-level HTTP library, requests, would be very helpful.
Nice to hear that this PR could contribute some functionality, and that patching requests
might be the way to achieve this relatively easily.
Shall I continue working on this patch? For a minimal version I think I should address the following issues at least.
- [x] Use
ArrayBuffer
instead ofBlob
for binary responses, ~since Node does not haveBlob
~ since we convert it toArrayBuffer
anyway - [x] Find out if we need to do something special for gzipped responses (
Response.iter_content()
vsResponse.raw
). My guess is that the un-gzipped data stream might not be accessible anymore, since the browser will un-gzip it eagerly. Most use cases will need the decompressed version anyway. - [x] Make
Response.iter_content()
obey the suppliedchunk_size
- [x] Implement supplying headers as list of tuples instead of a mapping
- [x] Implement other possible arguments for the
data
parameter, not just dictionaries (sent as json), but also list of tuples, ~bytes or a file object~ - [x] Debug-level logging for anything that works slightly different than normal requests
- [x] Warning-level logging for anything that isn't supported at all
Edit, added:
- [x] ~If users supply the header
Accept-Encoding
or one of the other forbidden headers we need to have workarounds or perform alternative calls~
And of course all the things that are already in the pull request template (changelog, docs, tests)
Do you agree this is enough for a first version? Am I missing something?
Since requests
is a pure python package, maybe making this a separate package (e.g. pyodide-requests
) and uploading it to PyPI might be a better solution. Because large patch files are hard to maintain.
Then, we can make calling import pyodide_requests
have a side effect of overriding import requests
so that other packages using requests
can be used without any modification.
we can make calling import pyodide_requests have a side effect of overriding import requests
Namely by saying
import sys
sys.modules["requests"] = pyodide_requests
we can make calling import pyodide_requests have a side effect of overriding import requests
Namely by saying
import sys sys.modules["requests"] = pyodide_requests
I'm not sure if having a package like pyodide_requests
would result in a good user experience (especially in products like Starboard Notebook). If a user depends on a package like discord.py
, they should realize it needs requests
and do something like:
import micropip
await micropip.install('discord.py')
await micropip.install('pyodide_requests')
import pyodide_requests # to get the side effect of the `requests` override
import discord
If we patch requests
users can rely on micropip
finding the patched package:
import micropip
await micropip.install('discord.py')
import discord
With this solution code can remain a lot more similar to existing Python code.
Because large patch files are hard to maintain.
I think maintaining the patch is doable if you would maintain a fork of the original requests
repository. My current setup to get the patch I checked in is:
- In the fork of
requests
there are two branches,main
andpyodide
- If
requests
is updated, pullmain
from upstreamrequests
- Check if there are any new merge conflicts and resolve those
- Get the new diff between the heads of the
pyodide
andmain
branch and put it in thepatches
directory - Update the
url
andsha256
values inpackages/requests/meta.yaml
To make the patch more readable, I could also split it in one that patches models.Response
and one that patches Session.request
.
I'm not sure if having a package like
pyodide_requests
would result in a good user experience (especially in products like Starboard Notebook).
I agree that it will hurt the user experience. But explicitly informing users that they are using the patched version of requests is somewhat meaningful I guess.
I think patching requests would be a good starter, and if someday it becomes problematic, we can find other options.
I think maintaining the patch is doable if you would maintain a fork of the original requests repository
If you have a fork with those fixes, IMO there is no point in creating patches. You can tag to make releases and directly point to the tag as the download URL.
Also since it's a Python package, there isn't even a point in packaging it with the Pyodide build system really. We could just create a wheel, upload to PyPi and install from there.
The concern there is about naming, and how it's loaded by micropip. I opened https://github.com/pyodide/micropip/issues/9 to discuss it. If some redirect mechanism is implemented in micropip, so that when micropip.install('requests')
actually loads pyodide-requests
, you could work on this completely independently from the pyodide release cycle which would be much easier IMO. If you are interested in maintaining this package/fork over time, we could then also move to the pyodide org if you are interested.
If some redirect mechanism is implemented in micropip, so that when
micropip.install('requests')
actually loadspyodide-requests
, you could work on this completely independently from the pyodide release cycle which would be much easier IMO.
This sounds nice. If this redirect mechanism exists, I think the user experience of a requests
fork with a different name (like pyodide-requests
) would be good. If they simply use micropip.install
they wouldn't need to do anything special.
If you are interested in maintaining this package/fork over time, we could then also move to the pyodide org if you are interested.
I'm definitely willing to transfer the fork of requests
to the pyodide organisation and maintain it. In its current state I think it's already an improvement over no requests
functionality at all, but there are still some use cases left unsupported (most importantly binary responses). Currently it uses synchronous XMLHttpRequest
, limiting what responseType
can be used. If I were to continue working on it, perhaps I would like to have it use the more modern fetch
API offered by Javascript. This might not be necessary for a first version though, as long as we warn users when they hit the not-yet-supported bits.
If I were to continue working on it, perhaps I would like to have it use the more modern fetch API offered by Javascript.
The issue there is we can't currently easily use async fetch API from a sync Python function https://github.com/pyodide/pyodide/issues/140#issuecomment-856810537 and https://github.com/pyodide/pyodide/issues/1503
If I were to continue working on it, perhaps I would like to have it use the more modern fetch API offered by Javascript.
The issue there is we can't currently easily use async fetch API from a sync Python function #140 (comment) and pyodide/pyodide#1503
I think delegating synchronous requests to the Web Worker through pthread/Comlink would be promising, though I didn't look into that much.
Hello, I was looking into making a requests shim for a personal project when I came across an an interesting requests
feature. Instead of rewriting requests
as pyodide_requests
, would it be possible to write a custom Transport Adapter for the http[s]://
prefix? I have produced a very minimal example that seems to work well without a rewrite:
import requests
import io
import js
import pyodide
def init():
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
body = request.body
jsreq = js.XMLHttpRequest.new()
jsreq.open(request.method.upper(), request.url, False)
if timeout:
jsreq.timeout = timeout * 1000
for header, value in request.headers.items():
jsreq.setRequestHeader(header, value)
try:
jsreq.send(body)
except pyodide.JsException as err:
raise requests.HTTPError("A JavaScript exception was raised during the processing of this request: {err.js_error}")
response = requests.Response()
response.status_code = jsreq.status
headers = dict(map(lambda i: tuple(i.split(': ', 1)), jsreq.getAllResponseHeaders().split("\r\n")[:-1]))
response.headers = headers
response._content = jsreq.responseText
response.req = request
response._jsreq = jsreq
response.connection = self
response.url = jsreq.responseURL
response.reason = "" if " " not in jsreq.statusText else jsreq.statusText.split(" ", 1)[1]
return response
requests.adapters.HTTPAdapter.send = send
init()
requests.get("https://httpbin.org/anything")
Obviously, there are still a few edge cases that need to be addressed, and something like this will only work in Workers (because of sync XHR requests). However, overriding the default transport adapter is a viable alternative to a rewrite.
However, overriding the default transport adapter is a viable alternative to a rewrite.
Nice! This seems like a very straightforward way to override the behaviour of requests
, and have it in a separate package at the same time. Instead of having the user call an init()
method however, I think it would be nice to rely on some redirect mechanism as discussed in pyodide/micropip#9. Because my primary use case involves notebooks, I like it if the user has to do as little Pyodide specific tricks as possible.
Last month I didn't have a lot of time to work on this, but working on it before, I became less and less convinced that XMLHttpRequest
is the best part of the Javascript API to use. Major drawback is that it can't handle raw responses in synchronous mode easily. The newest pyodide
release (0.19) contains work to proxy Javascript's asynchronous fetch
(pyfetch
in Pyodide). Perhaps the requests
shim could also use pyfetch
.
Thank you for your work on this @bartbroere !
I think now it has become clear that we do not want such package to be part of the pydide monorepo. Instead it should be in a standalone repository, uploaded to PyPi and installable via micropip
. That would be much easier maintenance wise, and allow for faster iteration.
There are for instance https://github.com/koenvo/pyodide-http and https://github.com/emscripten-forge/requests-wasm-polyfill projects in this space.
So I'm going to close this PR, but please don't hesitate to contribute your work to either of those packages (or to create a new one).
Yes, I agree that a patch on top of the original requests is probably not the best solution. Closing this PR is fine.
As a temporary hack it serves me well for now, but I will also migrate our internal code to the polyfill with the webworker approach. Not having to block the main thread is a big advantage!
For anyone visiting this thread now, note that there now is a better solution here: https://github.com/koenvo/pyodide-http
It's also included in default Pyodide builds, so you probably can get requests
to work by following the README.