pyodide icon indicating copy to clipboard operation
pyodide copied to clipboard

Discussion / WIP: Add a very minimal requests shim and its dependencies

Open bartbroere opened this issue 2 years ago • 13 comments

Description

Currently the Pyodide project can install Python requests, because it is a pure Python library. It will not work however, since it depends on the http library. I think it could be nice to support / maintain a version of requests that proxies any input it gets to javascript XMLHttpRequest, using the js module provided by pyodide itself. I have made a very basic and minimal version of this "requests shim" by patching the Session.request method and the models.Response class.

This can already do some basic things well enough, and works like a charm for my current project, where I don't need everything requests offers. I also think a pyodide version of requests shouldn't necessarily produce the exact same results as CPython requests. The browser already knows quite well which headers and cookies to send for example. Where possible, it's probably best to rely on the browser's opinion about what an acceptable HTTP request should contain. On top of that, some parts of requests are probably impossible to support at all, like the verify=False option for disabling SSL checks.

To see the changes I made in pyodide-requests-shim.patch presented nicely, check the diff at this PR: https://github.com/bartbroere/requests/pull/1

This is definitely still a work-in-progress PR, but I hope it can be a discussion starter: Could a patch of requests that proxies calls to Javascript become a part of the pyodide project?

Checklists

Not all of these steps are necessary for every PR.

  • [ ] Add a CHANGELOG entry or explain why an entry is not needed
  • [ ] Add / update tests
  • [ ] Add new / update outdated documentation

bartbroere avatar Nov 13 '21 15:11 bartbroere

Thanks for working on this @bartbroere ! This goes toward https://github.com/pyodide/pyodide/issues/140

Could a patch of requests that proxies calls to Javascript become a part of the pyodide project?

We clearly need to have better support for the file downloads in Python. This likely means that we would need to patch something, but the question is at what level it should happen. Ideally patching http.client (@hoodmane did some work in https://github.com/pyodide/pyodide/issues/140#issuecomment-897888308) could mean that urllib3 and requests would work without patching. But in practice I'm not convinced that this would really be the case, and maybe we would need to patch requests as you are doing here in any case.

In any case experimenting with this is very useful.

rth avatar Nov 13 '21 17:11 rth

Thanks in advance @bartbroere! I am also interested in patching requests.

Patching http.client would be ideal, but I don't think it's easy to emulate socket behavior fully in javascript. So for a starter, patching the most used high-level HTTP library, requests, would be very helpful.

ryanking13 avatar Nov 13 '21 22:11 ryanking13

Nice to hear that this PR could contribute some functionality, and that patching requests might be the way to achieve this relatively easily.

Shall I continue working on this patch? For a minimal version I think I should address the following issues at least.

  • [x] Use ArrayBuffer instead of Blob for binary responses, ~since Node does not have Blob~ since we convert it to ArrayBuffer anyway
  • [x] Find out if we need to do something special for gzipped responses (Response.iter_content() vs Response.raw). My guess is that the un-gzipped data stream might not be accessible anymore, since the browser will un-gzip it eagerly. Most use cases will need the decompressed version anyway.
  • [x] Make Response.iter_content() obey the supplied chunk_size
  • [x] Implement supplying headers as list of tuples instead of a mapping
  • [x] Implement other possible arguments for the data parameter, not just dictionaries (sent as json), but also list of tuples, ~bytes or a file object~
  • [x] Debug-level logging for anything that works slightly different than normal requests
  • [x] Warning-level logging for anything that isn't supported at all

Edit, added:

And of course all the things that are already in the pull request template (changelog, docs, tests)

Do you agree this is enough for a first version? Am I missing something?

bartbroere avatar Nov 14 '21 15:11 bartbroere

Since requests is a pure python package, maybe making this a separate package (e.g. pyodide-requests) and uploading it to PyPI might be a better solution. Because large patch files are hard to maintain.

Then, we can make calling import pyodide_requests have a side effect of overriding import requests so that other packages using requests can be used without any modification.

ryanking13 avatar Nov 15 '21 00:11 ryanking13

we can make calling import pyodide_requests have a side effect of overriding import requests

Namely by saying

import sys
sys.modules["requests"] = pyodide_requests

hoodmane avatar Nov 15 '21 00:11 hoodmane

we can make calling import pyodide_requests have a side effect of overriding import requests

Namely by saying

import sys
sys.modules["requests"] = pyodide_requests

I'm not sure if having a package like pyodide_requests would result in a good user experience (especially in products like Starboard Notebook). If a user depends on a package like discord.py, they should realize it needs requests and do something like:

import micropip
await micropip.install('discord.py')
await micropip.install('pyodide_requests')
import pyodide_requests  # to get the side effect of the `requests` override

import discord

If we patch requests users can rely on micropip finding the patched package:

import micropip
await micropip.install('discord.py')

import discord

With this solution code can remain a lot more similar to existing Python code.

Because large patch files are hard to maintain.

I think maintaining the patch is doable if you would maintain a fork of the original requests repository. My current setup to get the patch I checked in is:

  1. In the fork of requests there are two branches, main and pyodide
  2. If requests is updated, pull main from upstream requests
  3. Check if there are any new merge conflicts and resolve those
  4. Get the new diff between the heads of the pyodide and main branch and put it in the patches directory
  5. Update the url and sha256 values in packages/requests/meta.yaml

To make the patch more readable, I could also split it in one that patches models.Response and one that patches Session.request.

bartbroere avatar Nov 15 '21 09:11 bartbroere

I'm not sure if having a package like pyodide_requests would result in a good user experience (especially in products like Starboard Notebook).

I agree that it will hurt the user experience. But explicitly informing users that they are using the patched version of requests is somewhat meaningful I guess.

I think patching requests would be a good starter, and if someday it becomes problematic, we can find other options.

ryanking13 avatar Nov 15 '21 10:11 ryanking13

I think maintaining the patch is doable if you would maintain a fork of the original requests repository

If you have a fork with those fixes, IMO there is no point in creating patches. You can tag to make releases and directly point to the tag as the download URL.

Also since it's a Python package, there isn't even a point in packaging it with the Pyodide build system really. We could just create a wheel, upload to PyPi and install from there.

The concern there is about naming, and how it's loaded by micropip. I opened https://github.com/pyodide/micropip/issues/9 to discuss it. If some redirect mechanism is implemented in micropip, so that when micropip.install('requests') actually loads pyodide-requests, you could work on this completely independently from the pyodide release cycle which would be much easier IMO. If you are interested in maintaining this package/fork over time, we could then also move to the pyodide org if you are interested.

rth avatar Nov 21 '21 23:11 rth

If some redirect mechanism is implemented in micropip, so that when micropip.install('requests') actually loads pyodide-requests, you could work on this completely independently from the pyodide release cycle which would be much easier IMO.

This sounds nice. If this redirect mechanism exists, I think the user experience of a requests fork with a different name (like pyodide-requests) would be good. If they simply use micropip.install they wouldn't need to do anything special.

If you are interested in maintaining this package/fork over time, we could then also move to the pyodide org if you are interested.

I'm definitely willing to transfer the fork of requests to the pyodide organisation and maintain it. In its current state I think it's already an improvement over no requests functionality at all, but there are still some use cases left unsupported (most importantly binary responses). Currently it uses synchronous XMLHttpRequest, limiting what responseType can be used. If I were to continue working on it, perhaps I would like to have it use the more modern fetch API offered by Javascript. This might not be necessary for a first version though, as long as we warn users when they hit the not-yet-supported bits.

bartbroere avatar Nov 22 '21 09:11 bartbroere

If I were to continue working on it, perhaps I would like to have it use the more modern fetch API offered by Javascript.

The issue there is we can't currently easily use async fetch API from a sync Python function https://github.com/pyodide/pyodide/issues/140#issuecomment-856810537 and https://github.com/pyodide/pyodide/issues/1503

rth avatar Nov 22 '21 20:11 rth

If I were to continue working on it, perhaps I would like to have it use the more modern fetch API offered by Javascript.

The issue there is we can't currently easily use async fetch API from a sync Python function #140 (comment) and pyodide/pyodide#1503

I think delegating synchronous requests to the Web Worker through pthread/Comlink would be promising, though I didn't look into that much.

ryanking13 avatar Nov 23 '21 00:11 ryanking13

Hello, I was looking into making a requests shim for a personal project when I came across an an interesting requests feature. Instead of rewriting requests as pyodide_requests, would it be possible to write a custom Transport Adapter for the http[s]:// prefix? I have produced a very minimal example that seems to work well without a rewrite:

import requests
import io
import js
import pyodide

def init():
    def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
        body = request.body
        jsreq = js.XMLHttpRequest.new()
        jsreq.open(request.method.upper(), request.url, False)
        if timeout:
            jsreq.timeout = timeout * 1000
        for header, value in request.headers.items():
            jsreq.setRequestHeader(header, value)

        try:
            jsreq.send(body)
        except pyodide.JsException as err:
            raise requests.HTTPError("A JavaScript exception was raised during the processing of this request: {err.js_error}")

        response = requests.Response()
        response.status_code = jsreq.status
        headers = dict(map(lambda i: tuple(i.split(': ', 1)), jsreq.getAllResponseHeaders().split("\r\n")[:-1]))
        response.headers = headers
        response._content = jsreq.responseText
        response.req = request
        response._jsreq = jsreq
        response.connection = self
        response.url = jsreq.responseURL
        response.reason = "" if " " not in jsreq.statusText else jsreq.statusText.split(" ", 1)[1]

        return response
    requests.adapters.HTTPAdapter.send = send
init()
requests.get("https://httpbin.org/anything")

Obviously, there are still a few edge cases that need to be addressed, and something like this will only work in Workers (because of sync XHR requests). However, overriding the default transport adapter is a viable alternative to a rewrite.

cole-wilson avatar Jan 02 '22 01:01 cole-wilson

However, overriding the default transport adapter is a viable alternative to a rewrite.

Nice! This seems like a very straightforward way to override the behaviour of requests, and have it in a separate package at the same time. Instead of having the user call an init() method however, I think it would be nice to rely on some redirect mechanism as discussed in pyodide/micropip#9. Because my primary use case involves notebooks, I like it if the user has to do as little Pyodide specific tricks as possible.

Last month I didn't have a lot of time to work on this, but working on it before, I became less and less convinced that XMLHttpRequest is the best part of the Javascript API to use. Major drawback is that it can't handle raw responses in synchronous mode easily. The newest pyodide release (0.19) contains work to proxy Javascript's asynchronous fetch (pyfetch in Pyodide). Perhaps the requests shim could also use pyfetch.

bartbroere avatar Jan 02 '22 08:01 bartbroere

Thank you for your work on this @bartbroere !

I think now it has become clear that we do not want such package to be part of the pydide monorepo. Instead it should be in a standalone repository, uploaded to PyPi and installable via micropip. That would be much easier maintenance wise, and allow for faster iteration.

There are for instance https://github.com/koenvo/pyodide-http and https://github.com/emscripten-forge/requests-wasm-polyfill projects in this space.

So I'm going to close this PR, but please don't hesitate to contribute your work to either of those packages (or to create a new one).

rth avatar Sep 18 '22 08:09 rth

Yes, I agree that a patch on top of the original requests is probably not the best solution. Closing this PR is fine.

As a temporary hack it serves me well for now, but I will also migrate our internal code to the polyfill with the webworker approach. Not having to block the main thread is a big advantage!

bartbroere avatar Sep 20 '22 08:09 bartbroere

For anyone visiting this thread now, note that there now is a better solution here: https://github.com/koenvo/pyodide-http

It's also included in default Pyodide builds, so you probably can get requests to work by following the README.

bartbroere avatar Apr 21 '23 15:04 bartbroere