Emscripten support
Summary
I added a discussion for this ages back but there's been no input, so I've written it (because I was contracted to do the work anyway, so I might as well contribute it upstream). This PR adds support for running in emscripten / webassembly platforms, where all network connections go via the browser.
Currently in progress, but tests okay locally, so I've opened this to check the CI changes, I've got to update docs also.
Checklist
- [X ] I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
- [X] I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
- [x] I've updated the documentation accordingly.
This is really interesting, thanks... ☺️
I've taken a bit of a look at the ecosystem here, tho am going to need a bit more orientation... Would it make sense to document an example of how to write an HTML page that includes a Python REPL with httpx imported and available?
Cool.
Related https://github.com/python/steering-council/issues/256
@tomchristie I added some docs, and a page in the docs which is a live demo, along with instructions for hosting it yourself. If you clone this PR and then run scripts/build and scripts/docs you should be able to see the emscripten port working in chrome (on the advanced/emscripten page.
If this gets merged I can contribute this to the main pyodide distribution. Once that is done it would mean that import httpx would just work in pyodide environments.
Okay, really interesting... I've had a bit of a play around with this tho could do with walking through from the basics, if you're able to spend the time...
I'd like to start by getting to the point that I can add a custom transport to httpx in the pyodide console...
Here's my starting steps...
Open up https://pyodide.org/en/latest/console.html
Install httpx. It's not built-in, okay that's expected. It does load with micropip, which makes sense since it's pure python. Oddly ssl needs to be imported first(?). After that it can be imported just fine. 👍
Welcome to the Pyodide 0.27.0.dev0 terminal emulator 🐍
Python 3.12.1 (main, Oct 7 2024 14:46:27) on WebAssembly/Emscripten
Type "help", "copyright", "credits" or "license" for more information.
>>> import httpx
Traceback (most recent call last):
File "<console>", line 1, in <module>
ModuleNotFoundError: No module named 'httpx'
>>> import micropip
>>> await micropip.install('httpx')
>>> import httpx
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/lib/python3.12/site-packages/httpx/__init__.py", line 2, in <module>
from ._api import *
File "/lib/python3.12/site-packages/httpx/_api.py", line 6, in <module>
from ._client import Client
File "/lib/python3.12/site-packages/httpx/_client.py", line 12, in <module>
from ._auth import Auth, BasicAuth, FunctionAuth
File "/lib/python3.12/site-packages/httpx/_auth.py", line 12, in <module>
from ._models import Cookies, Request, Response
File "/lib/python3.12/site-packages/httpx/_models.py", line 11, in <module>
from ._content import ByteStream, UnattachedStream, encode_request, encode_response
File "/lib/python3.12/site-packages/httpx/_content.py", line 17, in <module>
from ._multipart import MultipartStream
File "/lib/python3.12/site-packages/httpx/_multipart.py", line 8, in <module>
from ._types import (
File "/lib/python3.12/site-packages/httpx/_types.py", line 5, in <module>
import ssl
ModuleNotFoundError: No module named 'ssl'
>>> import ssl
>>> import httpx
The models work as expected...
>>> httpx.Request('GET', 'https://www.example.com')
<Request('GET', 'https://www.example.com/')>
Tho we can't send requests, again, as expected...
>>> httpx.get('https://www.example.com')
Traceback (most recent call last):
...
httpx.ConnectError: [Errno 23] Host is unreachable
Okay, so next step, I need to figure out how to send a JS request/response in this console, so I can then implement a transport class using that. Let's try XMLHttpRequest?...
>>> import js
>>> js_xhr = js.XMLHttpRequest.new()
>>> js_xhr.open('GET', 'http://www.example.com/', False)
>>> js_xhr.send()
Traceback (most recent call last):
File "<console>", line 1, in <module>
pyodide.ffi.JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'http://www.example.com/'.
Okay, how about using fetch?...
>>> await js.fetch('https://www.example.com/')
Traceback (most recent call last):
File "<console>", line 1, in <module>
pyodide.ffi.JsException: TypeError: Failed to fetch
That's where I'm currently stuck... what am I missing in order to make a simple-as-possible XMLHttpRequest or fetch work?
I think you're running afoul of cross origin resource sharing (CORS) restrictions here. Try fetching console.html so that it will be a same origin request. Or from a CDN or anything that sets access-control-allow-origin: * as a response header.
>>> from js import fetch
>>> resp = await fetch("console.html")
>>> text = await resp.text()
>>> print(text[:100])
<!doctype html>
<html>
<head>
<meta charset="UTF-8" />
<meta
http-equiv="origin-tria
Ah yep, okay...
>>> r = await js.fetch('https://cdn.jsdelivr.net/pyodide/v0.23.4/full/repodata.json')
>>> t = await r.text()
>>> t[:100]
'{"info": {"arch": "wasm32", "platform": "emscripten_3_1_32", "version": "0.23.4", "python": "3.11.2"'
Okay, well this is neat.
Open the pyodide console, then...
>>> import micropip, ssl, js
>>> await micropip.install('httpx')
>>> import httpx
>>> class JSTransport(httpx.AsyncBaseTransport):
async def handle_async_request(self, request):
url = str(request.url)
options = {
'method': request.method,
'headers': dict(request.headers),
'body': await request.aread(),
}
fetch_response = await js.fetch(url, options)
status_code = fetch_response.status
headers = dict(fetch_response.headers)
buffer = await fetch_response.arrayBuffer()
content = buffer.to_bytes()
return httpx.Response(status_code=status_code, headers=headers, content=content)
>>> client = httpx.AsyncClient(transport=JSTransport())
>>> r = await client.get('https://cdn.jsdelivr.net/pyodide/v0.23.4/full/repodata.json')
>>> r
<Response [200 OK]>
>>> r.json()
{'info': {'arch': 'wasm32', 'platform': 'emscripten_3_1_32', 'version': '0.23.4', 'python': '3.11.2'}, 'packages': {'asciitree': {'n
ame': 'asciitree', 'version': '0.3.3', ...
Dealing with this incrementally, here’s some isolated PRs that I think we should address first…
- Refactor the import of httpcore so that it’s only loaded if HTTPTransport/AsyncHTTPTransport is instantiated.
- Refactor the import of certifi in
_config.pyso it’s only loaded if SSLContext is instantiated. - Refactor imports of ssl so that it’s only loaded if SSLContext is instantiated, or is behind a TYPE_CHECKING guard.
(If anyone’s up for tackling these, currently ought to be against the version-1.0 branch, until that’s merged)
Thanks again for your work here @joemarshall. Here's where I think we're at on this...
- [x] Lazy load
certifi&httpcore - [ ] Import
sslunder typechecking branches. - [ ] Consider introducing
JSFetchTransport().
@tomchristie I put in the PR that makes import ssl optional now (#3385 )
I updated this PR so it follows on from that PR.
How this PR works now is it moves _transports.default into _transports.httpcore, which defines [Async]HTTPCoreTransport, adds an extra module _transports.jsfetch file which defines [Async]JavascriptFetchTransport. Then in _transports/__init__.py it adds an alias of HTTPTransport which goes to whichever HTTP backend is in use (i.e. httpcore by default, JS fetch on emscripten)
@tomchristie I updated this to follow the changes in master - I think #3385 is redundant now, as the ssl changes are minimal at this point.
I'll add that I've been playing with httpx on https://pydantic.run over the last few days, both sync and async, and apart from the prints reported in https://github.com/pyodide/pyodide/issues/5381, it seems to be working well otherwise!
Oh darn, can't believe I missed those debug prints. Fixed now.
Thanks so much @joemarshall for fixing those.
@hoodmane or @joemarshall, I'm not sure what the process is (or I'd try to help), but please can we update pyodide to use the head of his PR to avoid those debug print statements confusing users.
@tomchristie anything stopping this being merged?
I think Joe Marshall already sent a pr to Pyodide to update it so when we make another release it will bring it in.
I'm getting this error when using openai in pyodide, looks like the openai SDK is assuming something is bytes when it's actually a memoryview
File "/lib/python3.12/site-packages/openai/_streaming.py", line 147, in __aiter__
async for item in self._iterator:
File "/lib/python3.12/site-packages/openai/_streaming.py", line 160, in __stream__
async for sse in iterator:
File "/lib/python3.12/site-packages/openai/_streaming.py", line 151, in _iter_events
async for sse in self._decoder.aiter_bytes(self.response.aiter_bytes()):
File "/lib/python3.12/site-packages/openai/_streaming.py", line 302, in aiter_bytes
async for chunk in self._aiter_chunks(iterator):
File "/lib/python3.12/site-packages/openai/_streaming.py", line 314, in _aiter_chunks
for line in chunk.splitlines(keepends=True):
^^^^^^^^^^^^^^^^
AttributeError: 'memoryview' object has no attribute 'splitlines'
Any chance the error is related to this PR?
it works fine locally, and it works fine when not using their streaming responses. Happy to give more details or create a separate issue if that helps?
anything stopping this being merged?
I would suggest not making any PR/changes to httpx, and start by demo'ing an Emscripten transport.
I would suggest not making any PR/changes to
httpx, and start by demo'ing an Emscripten transport.
Isn't that roughly what _transports/jsfetch.py in this PR is? The rest of the changes are adding tests for emscripten and making it auto-select the emscripten transport on emscripten but jsfetch.py is a transport for emscripten?
As a huge fan and user of pyodide, I was really looking forward to this PR adding native support. With this now being closed, what is the recommended/suggested way to use httpx in this context going forward? Thanks for a great library by the way! 😄
Sensible initial options would be a third party package, or tutorial demo'ing an emscripten transport.
Something that we can link to outside of the core package.
@haakonvt We have this branch included in Pyodide's package index so it shouldn't be too hard to use it from there (though I forgot to update the version included in Pyodide 0.27.3 to remove some debug prints).
Heya.
So.... perhaps a warm approach to this might be someone slinging up a demo webpage showing how to use pyodide with httpx, based on the outline here. Just a single webpage demonstrating some care & attention to detail.
As it currently stands a significant pull request like this is a big ask to make without first making that case.
I'd be happy to talk about this in more detail on a developer call if that's something any of y'all would like.
I can totally understand the hesitance to merge a patch like this - merging it means committing to supporting it long term, and it's in aid of a platform that is (at present, at least) a bit niche, and not even Tier 3 CPython supported (... yet 😄). That's a big ask of any maintainer.
I can't speak for @joemarshall or @hoodmane, but as a casual Pyodide/PyScript user, my concern is that needing to manually specify the transport on every Emscripten usage, while a workable for demonstration purposes, is onerous as a "mainstream" API usage pattern. Documentation will perhaps elaborate on what the code in this PR is doing; but it won't make actually using httpx in Pyodide any easier for most users (other than maybe proving that it is possible at all).
Taking something like the BeeWare tutorial as an example - we use httpx as part of our cross-platform app building tutorial; while we could add AsyncClient(transport=JSTransport() if sys.platform == 'emscripten' else None) logic to the tutorial, it's a bit messy to have to do so. Platform specific logic like that isn't needed anywhere else in the tutorial - and this is a tutorial that spans 3 desktop and 2 mobile operating systems, plus the browser, and is able to present a native GUI application with dialogs.
So - a middle ground proposal: would you be amenable to adding a plugin interface to httpx (in the form of a PEP 621 entry point) that allows a third-party package to provide an alternate mechanism for determining the default transport? That way, a user would be able to add httpx-js (or similar) to their requirements; by installing that plugin, they’d get the JSTransport defined and set as the default transport when sys.platform == "emscripten".
From httpx's perspective, that reduces the maintenance surface to the API for selecting the default transport. From Pyodide's perspective, “here’s 1 line to put in your requirements and it will work” doesn’t seem like a bad alternative to having the behavior baked in. And having the plugin API doesn't preclude the long term possibility of merging the httpx-js code into httpx core, if/when that seems warranted.
Publishing httpx-js (or whatever it ends up being called) as a standalone package, with "how to use this" manual configuration documentation would be the initial step. Once that's in place, adding a plugin interface and implementation should be a relatively straightforward (and fairly non-invasive) set of changes on both sides.
Does that seem like a viable way to move forward on this?
Whilst my preferred long term solution is to upstream things where possible, I think pyodide as a distribution can make things smoother, by patching httpx at distribution level.
We essentially have 3 levels of integration - urllib3 / requests have upstreamed support completely, so they just work.
Pyarrow supports emscripten natively, and pyodide packages with some minor patches relating to timezone support.
Httpx currently I have a fork which is in pyodide which automatically selects the emscripten transport.
So I guess for me the ideal would be full support, next best is jstransport included and pyodide patching it to default, but if need be we'll just keep a fork.
would you be amenable to adding a plugin interface to httpx (...) that allows a third-party package to provide an alternate mechanism for determining the default transport?
Perhaps.
A context managed switch at that layer is an interesting idea, and might also be a good route to supporting mock responses.
@freakboy3742's suggestion here seems most viable from my pov...
Publishing httpx-js (or whatever it ends up being called) as a standalone package, with "how to use this" manual configuration documentation would be the initial step.
Wouldn't have any objections to something along those lines.
import httpxjs
Seems reasonable. A package that just exports httpxjs.Client as public API might be a reasonable start here, perhaps. MVP etc. I'm not currently in a position to prioritise that myself, tho.