pywb icon indicating copy to clipboard operation
pywb copied to clipboard

Upgrade dependencies

Open edsu opened this issue 1 year ago • 12 comments

This is a draft since 4 tests are failing after the upgrades...

Description

Upgrade various dependencies (gevent, werkzeug, markupsafe, jinja2, flask and httpbin) with the goal of fixing builds with Python v3.11 and higher.

Motivation and Context

I spent a bit of time trying to let gevent float to 3.1.2 and working through the effects of running pip install -r requirements.txt.

The good news is gevent v22.10.2 installs under Python3.11 now. But installing the latest werkzeug (which is currently unpinned) brings along the need to upgrade markupsafe and jinja2. I worked through the minimal changes to get that working, which mostly involved using pass_context instead of contextfunction.

The bad news is that the tests installed an old version of flask, which pulled in an old version of werkzeug which breaks things again. The reason why flask < v2 is installed is to get the old werkzeug which is needed by httpbin, which appears to be dead?

I think we could pin back werkzeug in our requirements.txt to solve this. But it seems like we would just be forestalling the inevitable need to upgrade? The approach taken here is to install a fork of httpbin for testing which has been updated to work with the latest werkzeug.

Fixes #835

Types of changes

  • [ ] Replay fix (fixes a replay specific issue)
  • [X] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [ ] My change requires a change to the documentation.
  • [ ] I have updated the documentation accordingly.
  • [ ] I have added or updated tests to cover my changes.
  • [ ] All new and existing tests passed.

edsu avatar Apr 12 '23 12:04 edsu

Hi, thank you for this draft MR.

I am trying to get this project into the gentoo user repository so it would be very valuable to have python 3.10+ support.

For this reason I tried to add two small modifications in order to fix the tests:

  • this is my own fork adding a commit from your branch https://github.com/FraMecca/pywb/tree/upgrade-dependencies: it adds back the pop_path_info function that was removed from werkzeug.wsgi
  • I forked pyamf (required by warcio, so an indirect dependency) to work with python3.11: the C extension is not available anymore but there is replacement code. I think that works better because it is easier to update the code this way: https://github.com/FraMecca/pyamf

FraMecca avatar Jun 02 '23 15:06 FraMecca

Overall there are only 5 tests failing on my branch but they are all related to path rewriting. I need help fixing those because I have no experience with that part of the codebase

platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/user/pywb, configfile: tox.ini, testpaths: tests
plugins: cov-4.1.0
collected 512 items                                                                                                                                                          

tests/test_acl.py ..........                                                                                                                                           [  1%]
tests/test_acl_manager.py .................                                                                                                                            [  5%]
tests/test_auto_colls.py ..........................................                                                                                                    [ 13%]
tests/test_cdx_server_app.py .................                                                                                                                         [ 16%]
tests/test_cert_req.py ..                                                                                                                                              [ 17%]
tests/test_cli.py .........                                                                                                                                            [ 18%]
tests/test_embargo.py ......                                                                                                                                           [ 20%]
tests/test_force_https.py ........                                                                                                                                     [ 21%]
tests/test_integration.py ....................................................................................................                                         [ 41%]
tests/test_live_rewriter.py ..........FFFF.............F......                                                                                                         [ 47%]
tests/test_locales.py ssss                                                                                                                                             [ 48%]
tests/test_manager.py .......                                                                                                                                          [ 50%]
tests/test_memento.py ..............................                                                                                                                   [ 55%]
tests/test_prefer_header.py ...................................................................................                                                        [ 72%]
tests/test_prefixed_deploy.py ......                                                                                                                                   [ 73%]
tests/test_proxy.py ssssssssssssssssssssssssssssssssssssssssssssss                                                                                                     [ 82%]
tests/test_range.py ............                                                                                                                                       [ 84%]
tests/test_record_dedup.py .....                                                                                                                                       [ 85%]
tests/test_record_replay.py ........................                                                                                                                   [ 90%]
tests/test_redirect_classic.py ............                                                                                                                            [ 92%]
tests/test_redirect_revisits.py .......                                                                                                                                [ 93%]
tests/test_redirects.py ...............                                                                                                                                [ 96%]
tests/test_root_coll.py ........                                                                                                                                       [ 98%]
tests/test_socks.py ....                                                                                                                                               [ 99%]
tests/test_zipnum_auto_dir.py ....     

FraMecca avatar Jun 02 '23 15:06 FraMecca

@FraMecca I see 4 tests failing in my Python 3.11 dev environment:

FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[non-frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length_with_range[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 206)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length_with_range[non-frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 206)

I'm pretty sure these are related to me switching to a fork of httpbin. I need to get these tests problems resolved now since I'm having trouble installing greenlet v1 on the latest debian Docker images, which ship with Python 3.11.

edsu avatar Jun 19 '23 15:06 edsu

@edsu so we have the same failures on our branches. I don't understand how you are able to pull that without switching to a py3 version of pyamf. What do you plan to do for httpbin?

FraMecca avatar Jun 28 '23 06:06 FraMecca

@FraMecca I'm currently using a fork of httpbin that has been updated: https://github.com/webrecorder/pywb/blob/upgrade-dependencies/test_requirements.txt#L9

I updated this PR to replace werkzeug.wsgi import pop_path_info with wsgiref.util import shift_path_info since the werkzeug team decided it was duplicated there.

It looks like py3AMF is being installed conditionally in the setup.py. Is that not working for you?

edsu avatar Jun 29 '23 21:06 edsu

I think that perhaps these length tests are failing because spaces are now being removed from JSON responses somewhere in the stack, maybe by the httpbin fork?

Here are a bunch of space related changes I needed to make to get some tests to pass: https://github.com/webrecorder/pywb/commit/5b72d968717db159cc37a56ee430620f1dcfa161

edsu avatar Jun 29 '23 22:06 edsu

@edsu I get 400 instead of 200 on some tests that use httbin, so it might be a different issue

FraMecca avatar Jul 01 '23 05:07 FraMecca

It looks like py3AMF is being installed conditionally in the setup.py. Is that not working for you?

It's needed, for the tests, right? In any case porting the library to py3 wasn't a big effort so I would be glad if it can be of help to you

FraMecca avatar Jul 01 '23 06:07 FraMecca

@FraMecca there is already a Python3 port of pyamf at https://github.com/StdCarrot/Py3AMF. Does that not work for you?

edsu avatar Sep 22 '23 08:09 edsu

It looks like https://github.com/psf/httpbin is being actively maintained by the Python Software Foundation so I'm going to switch to using that.

Sadly, I see it still has the same set of HTTP 400 instead of HTTP 200 errors:

FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)

But at least if a change in httbin is needed it is feasible to get it applied. The original repository is abandoned.

edsu avatar Sep 22 '23 09:09 edsu

Sadly it appears that httpbin is no longer able to return invalid Content-Length headers:

curl -i -X GET "http://localhost:8080/response-headers?content-length=149,149" -H "accept: application/json"
HTTP/1.1 500 Internal Server Error
Connection: close
Content-Type: text/html
Content-Length: 141

<html>
  <head>
    <title>Internal Server Error</title>
  </head>
  <body>
    <h1><p>Internal Server Error</p></h1>

  </body>
</html

This is what appears in the docker console when run against the psf/httpbin:

[2023-09-22 10:21:01 +0000] [7] [ERROR] Error handling request /response-headers?content-length=149,149
Traceback (most recent call last):
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/base_async.py", line 55, in handle
    self.handle_request(listener_name, req, client, addr)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/ggevent.py", line 128, in handle_request
    super().handle_request(listener_name, req, sock, addr)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/base_async.py", line 108, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/flask/app.py", line 2213, in __call__
    return self.wsgi_app(environ, start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/flask/app.py", line 2197, in wsgi_app
    return response(environ, start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/werkzeug/wrappers/response.py", line 579, in __call__
    start_response(status, headers)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/http/wsgi.py", line 243, in start_response
    self.process_headers(headers)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/http/wsgi.py", line 264, in process_headers
    self.response_length = int(value)
ValueError: invalid literal for int() with base 10: '149,149'

Maybe the older version of httpbin wasn't so strict?

edsu avatar Sep 22 '23 13:09 edsu

Any news on this? We would like to upgrade our old Debian 11 servers to 12 :)

aquatix avatar Feb 23 '24 10:02 aquatix

Any news on this? We would like to upgrade our old Debian 11 servers to 12 :)

Hi, thanks for your patience and to @edsu for starting this branch! Apologies for the long delay on taking this up, we've been very focused on Browsertrix of late and are a small team. I'm planning to merge this into main soon with some other fixes/features and we should put out a new release shortly!

tw4l avatar Mar 29 '24 16:03 tw4l

Pinning gevent==23.9.0 and greenlet>=2.0.2 to specific versions because httpbin requires greenlet < 3 for now. We can update to latest/greenlet 3 when the PSF-maintained httpbin updates, or contribute that separately.

tw4l avatar Mar 29 '24 16:03 tw4l

Note that werkzeug 2.3.0 dropped support for Python 3.7; if we want to continue to support Python < 3.8, we'd need to pin to werkzeug==2.2.3, which does not support Python 3.12 so may become an issue shortly.

tw4l avatar Mar 29 '24 17:03 tw4l

It's also worth noting that werkzeug dropped support for Python 2 and Python 3.5 in version 2.0.0, released in 2021, and we previously didn't have the version pinned so likely our claim to support Python 2 and <3.6 hasn't actually been true for a little while. There are almost certainly some additional things that can be dropped accordingly, but I didn't want the scope of this PR to expand too much so that might be better done in a follow-up.

tw4l avatar Mar 29 '24 18:03 tw4l

We decided to support Python 3.7-3.11 in the next release, and will drop Python 3.7, add Python 3.12, and upgrade dependencies further in the following release. I will make an issue to document what will be needed at that point.

tw4l avatar Apr 01 '24 19:04 tw4l

Great work @tw4l !

aquatix avatar Apr 03 '24 05:04 aquatix