httpx icon indicating copy to clipboard operation
httpx copied to clipboard

Emscripten support

Open joemarshall opened this issue 1 year ago • 9 comments

Summary

I added a discussion for this ages back but there's been no input, so I've written it (because I was contracted to do the work anyway, so I might as well contribute it upstream). This PR adds support for running in emscripten / webassembly platforms, where all network connections go via the browser.

Currently in progress, but tests okay locally, so I've opened this to check the CI changes, I've got to update docs also.

Checklist

  • [X ] I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • [X] I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • [x] I've updated the documentation accordingly.

joemarshall avatar Oct 02 '24 12:10 joemarshall

This is really interesting, thanks... ☺️

I've taken a bit of a look at the ecosystem here, tho am going to need a bit more orientation... Would it make sense to document an example of how to write an HTML page that includes a Python REPL with httpx imported and available?

lovelydinosaur avatar Oct 02 '24 15:10 lovelydinosaur

Cool.

Related https://github.com/python/steering-council/issues/256

zanieb avatar Oct 04 '24 14:10 zanieb

@tomchristie I added some docs, and a page in the docs which is a live demo, along with instructions for hosting it yourself. If you clone this PR and then run scripts/build and scripts/docs you should be able to see the emscripten port working in chrome (on the advanced/emscripten page.

If this gets merged I can contribute this to the main pyodide distribution. Once that is done it would mean that import httpx would just work in pyodide environments.

joemarshall avatar Oct 07 '24 16:10 joemarshall

Okay, really interesting... I've had a bit of a play around with this tho could do with walking through from the basics, if you're able to spend the time...

I'd like to start by getting to the point that I can add a custom transport to httpx in the pyodide console...

Here's my starting steps...

Open up https://pyodide.org/en/latest/console.html

Install httpx. It's not built-in, okay that's expected. It does load with micropip, which makes sense since it's pure python. Oddly ssl needs to be imported first(?). After that it can be imported just fine. 👍

Welcome to the Pyodide 0.27.0.dev0 terminal emulator 🐍
Python 3.12.1 (main, Oct  7 2024 14:46:27) on WebAssembly/Emscripten
Type "help", "copyright", "credits" or "license" for more information.
>>> import httpx
Traceback (most recent call last):
  File "<console>", line 1, in <module>
ModuleNotFoundError: No module named 'httpx'
>>> import micropip
>>> await micropip.install('httpx')
>>> import httpx
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/lib/python3.12/site-packages/httpx/__init__.py", line 2, in <module>
    from ._api import *
  File "/lib/python3.12/site-packages/httpx/_api.py", line 6, in <module>
    from ._client import Client
  File "/lib/python3.12/site-packages/httpx/_client.py", line 12, in <module>
    from ._auth import Auth, BasicAuth, FunctionAuth
  File "/lib/python3.12/site-packages/httpx/_auth.py", line 12, in <module>
    from ._models import Cookies, Request, Response
  File "/lib/python3.12/site-packages/httpx/_models.py", line 11, in <module>
    from ._content import ByteStream, UnattachedStream, encode_request, encode_response
  File "/lib/python3.12/site-packages/httpx/_content.py", line 17, in <module>
    from ._multipart import MultipartStream
  File "/lib/python3.12/site-packages/httpx/_multipart.py", line 8, in <module>
    from ._types import (
  File "/lib/python3.12/site-packages/httpx/_types.py", line 5, in <module>
    import ssl
ModuleNotFoundError: No module named 'ssl'
>>> import ssl
>>> import httpx

The models work as expected...

>>> httpx.Request('GET', 'https://www.example.com')
<Request('GET', 'https://www.example.com/')>

Tho we can't send requests, again, as expected...

>>> httpx.get('https://www.example.com')
Traceback (most recent call last):
...
httpx.ConnectError: [Errno 23] Host is unreachable

Okay, so next step, I need to figure out how to send a JS request/response in this console, so I can then implement a transport class using that. Let's try XMLHttpRequest?...

>>> import js
>>> js_xhr = js.XMLHttpRequest.new()
>>> js_xhr.open('GET', 'http://www.example.com/', False)
>>> js_xhr.send()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
pyodide.ffi.JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'http://www.example.com/'.

Okay, how about using fetch?...

>>> await js.fetch('https://www.example.com/')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
pyodide.ffi.JsException: TypeError: Failed to fetch

That's where I'm currently stuck... what am I missing in order to make a simple-as-possible XMLHttpRequest or fetch work?

lovelydinosaur avatar Oct 08 '24 14:10 lovelydinosaur

I think you're running afoul of cross origin resource sharing (CORS) restrictions here. Try fetching console.html so that it will be a same origin request. Or from a CDN or anything that sets access-control-allow-origin: * as a response header.

hoodmane avatar Oct 08 '24 14:10 hoodmane

>>> from js import fetch
>>> resp = await fetch("console.html")
>>> text = await resp.text()
>>> print(text[:100])
<!doctype html>
<html>
  <head>
    <meta charset="UTF-8" />
    <meta
      http-equiv="origin-tria

hoodmane avatar Oct 08 '24 14:10 hoodmane

Ah yep, okay...

>>> r = await js.fetch('https://cdn.jsdelivr.net/pyodide/v0.23.4/full/repodata.json')
>>> t = await r.text()
>>> t[:100]
'{"info": {"arch": "wasm32", "platform": "emscripten_3_1_32", "version": "0.23.4", "python": "3.11.2"'

lovelydinosaur avatar Oct 08 '24 15:10 lovelydinosaur

Okay, well this is neat.

Open the pyodide console, then...

>>> import micropip, ssl, js
>>> await micropip.install('httpx')
>>> import httpx
>>> class JSTransport(httpx.AsyncBaseTransport):
    async def handle_async_request(self, request):
        url = str(request.url)
        options = {
            'method': request.method,
            'headers': dict(request.headers),
            'body': await request.aread(),
        }
        fetch_response = await js.fetch(url, options)
        status_code = fetch_response.status
        headers = dict(fetch_response.headers)
        buffer = await fetch_response.arrayBuffer()
        content = buffer.to_bytes()
        return httpx.Response(status_code=status_code, headers=headers, content=content)
>>> client = httpx.AsyncClient(transport=JSTransport())
>>> r = await client.get('https://cdn.jsdelivr.net/pyodide/v0.23.4/full/repodata.json')
>>> r
<Response [200 OK]>
>>> r.json()
{'info': {'arch': 'wasm32', 'platform': 'emscripten_3_1_32', 'version': '0.23.4', 'python': '3.11.2'}, 'packages': {'asciitree': {'n
ame': 'asciitree', 'version': '0.3.3', ...

lovelydinosaur avatar Oct 08 '24 16:10 lovelydinosaur

Dealing with this incrementally, here’s some isolated PRs that I think we should address first…

  • Refactor the import of httpcore so that it’s only loaded if HTTPTransport/AsyncHTTPTransport is instantiated.
  • Refactor the import of certifi in _config.py so it’s only loaded if SSLContext is instantiated.
  • Refactor imports of ssl so that it’s only loaded if SSLContext is instantiated, or is behind a TYPE_CHECKING guard.

(If anyone’s up for tackling these, currently ought to be against the version-1.0 branch, until that’s merged)

lovelydinosaur avatar Oct 09 '24 10:10 lovelydinosaur

Thanks again for your work here @joemarshall. Here's where I think we're at on this...

  • [x] Lazy load certifi & httpcore
  • [ ] Import ssl under typechecking branches.
  • [ ] Consider introducing JSFetchTransport().

lovelydinosaur avatar Oct 29 '24 13:10 lovelydinosaur

@tomchristie I put in the PR that makes import ssl optional now (#3385 )

I updated this PR so it follows on from that PR.

How this PR works now is it moves _transports.default into _transports.httpcore, which defines [Async]HTTPCoreTransport, adds an extra module _transports.jsfetch file which defines [Async]JavascriptFetchTransport. Then in _transports/__init__.py it adds an alias of HTTPTransport which goes to whichever HTTP backend is in use (i.e. httpcore by default, JS fetch on emscripten)

joemarshall avatar Oct 31 '24 22:10 joemarshall

@tomchristie I updated this to follow the changes in master - I think #3385 is redundant now, as the ssl changes are minimal at this point.

joemarshall avatar Nov 13 '24 10:11 joemarshall

I'll add that I've been playing with httpx on https://pydantic.run over the last few days, both sync and async, and apart from the prints reported in https://github.com/pyodide/pyodide/issues/5381, it seems to be working well otherwise!

samuelcolvin avatar Jan 28 '25 17:01 samuelcolvin

Oh darn, can't believe I missed those debug prints. Fixed now.

joemarshall avatar Jan 29 '25 13:01 joemarshall

Thanks so much @joemarshall for fixing those.

@hoodmane or @joemarshall, I'm not sure what the process is (or I'd try to help), but please can we update pyodide to use the head of his PR to avoid those debug print statements confusing users.

@tomchristie anything stopping this being merged?

samuelcolvin avatar Feb 08 '25 11:02 samuelcolvin

I think Joe Marshall already sent a pr to Pyodide to update it so when we make another release it will bring it in.

hoodmane avatar Feb 08 '25 12:02 hoodmane

I'm getting this error when using openai in pyodide, looks like the openai SDK is assuming something is bytes when it's actually a memoryview

  File "/lib/python3.12/site-packages/openai/_streaming.py", line 147, in __aiter__
    async for item in self._iterator:
  File "/lib/python3.12/site-packages/openai/_streaming.py", line 160, in __stream__
    async for sse in iterator:
  File "/lib/python3.12/site-packages/openai/_streaming.py", line 151, in _iter_events
    async for sse in self._decoder.aiter_bytes(self.response.aiter_bytes()):
  File "/lib/python3.12/site-packages/openai/_streaming.py", line 302, in aiter_bytes
    async for chunk in self._aiter_chunks(iterator):
  File "/lib/python3.12/site-packages/openai/_streaming.py", line 314, in _aiter_chunks
    for line in chunk.splitlines(keepends=True):
                ^^^^^^^^^^^^^^^^
AttributeError: 'memoryview' object has no attribute 'splitlines'

Any chance the error is related to this PR?

it works fine locally, and it works fine when not using their streaming responses. Happy to give more details or create a separate issue if that helps?

samuelcolvin avatar Feb 08 '25 14:02 samuelcolvin

anything stopping this being merged?

I would suggest not making any PR/changes to httpx, and start by demo'ing an Emscripten transport.

lovelydinosaur avatar Feb 08 '25 15:02 lovelydinosaur

I would suggest not making any PR/changes to httpx, and start by demo'ing an Emscripten transport.

Isn't that roughly what _transports/jsfetch.py in this PR is? The rest of the changes are adding tests for emscripten and making it auto-select the emscripten transport on emscripten but jsfetch.py is a transport for emscripten?

joemarshall avatar Feb 25 '25 13:02 joemarshall

As a huge fan and user of pyodide, I was really looking forward to this PR adding native support. With this now being closed, what is the recommended/suggested way to use httpx in this context going forward? Thanks for a great library by the way! 😄

haakonvt avatar Feb 26 '25 11:02 haakonvt

Sensible initial options would be a third party package, or tutorial demo'ing an emscripten transport.

Something that we can link to outside of the core package.

lovelydinosaur avatar Feb 26 '25 13:02 lovelydinosaur

@haakonvt We have this branch included in Pyodide's package index so it shouldn't be too hard to use it from there (though I forgot to update the version included in Pyodide 0.27.3 to remove some debug prints).

hoodmane avatar Feb 26 '25 13:02 hoodmane

Heya.

So.... perhaps a warm approach to this might be someone slinging up a demo webpage showing how to use pyodide with httpx, based on the outline here. Just a single webpage demonstrating some care & attention to detail.

As it currently stands a significant pull request like this is a big ask to make without first making that case.

I'd be happy to talk about this in more detail on a developer call if that's something any of y'all would like.

lovelydinosaur avatar Feb 27 '25 11:02 lovelydinosaur

I can totally understand the hesitance to merge a patch like this - merging it means committing to supporting it long term, and it's in aid of a platform that is (at present, at least) a bit niche, and not even Tier 3 CPython supported (... yet 😄). That's a big ask of any maintainer.

I can't speak for @joemarshall or @hoodmane, but as a casual Pyodide/PyScript user, my concern is that needing to manually specify the transport on every Emscripten usage, while a workable for demonstration purposes, is onerous as a "mainstream" API usage pattern. Documentation will perhaps elaborate on what the code in this PR is doing; but it won't make actually using httpx in Pyodide any easier for most users (other than maybe proving that it is possible at all).

Taking something like the BeeWare tutorial as an example - we use httpx as part of our cross-platform app building tutorial; while we could add AsyncClient(transport=JSTransport() if sys.platform == 'emscripten' else None) logic to the tutorial, it's a bit messy to have to do so. Platform specific logic like that isn't needed anywhere else in the tutorial - and this is a tutorial that spans 3 desktop and 2 mobile operating systems, plus the browser, and is able to present a native GUI application with dialogs.

So - a middle ground proposal: would you be amenable to adding a plugin interface to httpx (in the form of a PEP 621 entry point) that allows a third-party package to provide an alternate mechanism for determining the default transport? That way, a user would be able to add httpx-js (or similar) to their requirements; by installing that plugin, they’d get the JSTransport defined and set as the default transport when sys.platform == "emscripten".

From httpx's perspective, that reduces the maintenance surface to the API for selecting the default transport. From Pyodide's perspective, “here’s 1 line to put in your requirements and it will work” doesn’t seem like a bad alternative to having the behavior baked in. And having the plugin API doesn't preclude the long term possibility of merging the httpx-js code into httpx core, if/when that seems warranted.

Publishing httpx-js (or whatever it ends up being called) as a standalone package, with "how to use this" manual configuration documentation would be the initial step. Once that's in place, adding a plugin interface and implementation should be a relatively straightforward (and fairly non-invasive) set of changes on both sides.

Does that seem like a viable way to move forward on this?

freakboy3742 avatar Feb 28 '25 01:02 freakboy3742

Whilst my preferred long term solution is to upstream things where possible, I think pyodide as a distribution can make things smoother, by patching httpx at distribution level.

We essentially have 3 levels of integration - urllib3 / requests have upstreamed support completely, so they just work.

Pyarrow supports emscripten natively, and pyodide packages with some minor patches relating to timezone support.

Httpx currently I have a fork which is in pyodide which automatically selects the emscripten transport.

So I guess for me the ideal would be full support, next best is jstransport included and pyodide patching it to default, but if need be we'll just keep a fork.

joemarshall avatar Feb 28 '25 10:02 joemarshall

would you be amenable to adding a plugin interface to httpx (...) that allows a third-party package to provide an alternate mechanism for determining the default transport?

Perhaps.

A context managed switch at that layer is an interesting idea, and might also be a good route to supporting mock responses.

lovelydinosaur avatar Feb 28 '25 12:02 lovelydinosaur

@freakboy3742's suggestion here seems most viable from my pov...

Publishing httpx-js (or whatever it ends up being called) as a standalone package, with "how to use this" manual configuration documentation would be the initial step.

Wouldn't have any objections to something along those lines.

import httpxjs

Seems reasonable. A package that just exports httpxjs.Client as public API might be a reasonable start here, perhaps. MVP etc. I'm not currently in a position to prioritise that myself, tho.

lovelydinosaur avatar May 20 '25 09:05 lovelydinosaur