pywb
pywb copied to clipboard
Fails to record when in http proxy mode
Describe the bug
Pywb can't serve a request in HTTP proxy mode and record the WARC file. Hopefully I'm missing something simple!
Steps to reproduce the bug
I'm trying to setup the HTTP proxy to record Warc files. When I set my config.yaml to this:
proxy:
coll: test2
recording: true
I get no error on pywb startup, but an HTTP 400 error <p>Collection not found: <b>test2</b></p> when I make the request via ALL_PROXY=http://localhost:8080 curl example.com.
So then I run wb-manager init proxied, change the coll to proxied, and then when I run the same request, I get an http page that says Pywb Error and this error:
{'args': {'coll': 'proxied', 'type': 'record', 'metadata': {}, 'cache': 'default'}, 'error': '{"error": "HTTPError(\'404 Client Error: No Resource Found for url: http://localhost:40369/live/resource/postreq?param.recorder.coll=proxied&url=http%3A%2F%2Fexample.com%2F&closest=now&matchType=exact\')"}'}
and the logs are:
$ pywb
2023-06-05 04:19:04,234: [INFO]: Proxy recording into collection "proxied"
2023-06-05 04:19:04,356: [INFO]: Starting Gevent Server on 8080
127.0.0.1 - - [2023-06-05 04:19:09] "POST /live/resource/postreq?param.recorder.coll=proxied&url=http%3A%2F%2Fexample.com%2F&closest=now&matchType=exact HTTP/1.1" 404 215 0.000875
127.0.0.1 - - [2023-06-05 04:19:09] "POST /live/resource/postreq?param.recorder.coll=proxied&url=http%3A%2F%2Fexample.com%2F&closest=now&matchType=exact HTTP/1.1" 400 336 0.004111
127.0.0.1 - - [2023-06-05 04:19:09] "GET http://example.com/ HTTP/1.1" 400 1779 0.076372
$ lsof -iTCP
pywb 1700596 root 6u IPv4 455674019 0t0 TCP localhost:40369 (LISTEN)
pywb 1700596 root 7u IPv4 455674024 0t0 TCP localhost:45495 (LISTEN)
pywb 1700596 root 8u IPv4 455674027 0t0 TCP *:http-alt (LISTEN)
Expected behavior
The system should return the correct HTML response, record the WARC file, and I should see it on disk
Environment
- OS: Ubuntu 22.04.1 LTS, Linux 5.15.0-25-generic
- Python env:
conda list
List of packages in environment: "/root/micromamba/envs/proxy"
Name Version Build Channel
─────────────────────────────────────────────────────────────
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
c-ares 1.19.0 h5eee18b_0
ca-certificates 2023.01.10 h06a4308_0
certifi 2022.12.7 py37h06a4308_0
cffi 1.15.1 py37h5eee18b_3
gevent 21.12.0 py37haa10bde_2 conda-forge
greenlet 1.1.3 py37h6a678d5_0
ld_impl_linux-64 2.38 h1181459_1
libev 4.33 h7f8727e_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libgomp 13.1.0 he5830b7_0 conda-forge
libstdcxx-ng 11.2.0 h1234567_1
libuv 1.44.2 h5eee18b_0
ncurses 6.4 h6a678d5_0
openssl 1.1.1t h7f8727e_0
pip 22.3.1 py37h06a4308_0
pycparser 2.21 pyhd3eb1b0_0
python 3.7.16 h7a1cb2a_0
python_abi 3.7 2_cp37m conda-forge
readline 8.2 h5eee18b_0
setuptools 65.6.3 py37h06a4308_0
sqlite 3.41.2 h5eee18b_0
tk 8.6.12 h1ccaba5_0
wheel 0.38.4 py37h06a4308_0
xz 5.4.2 h5eee18b_0
zlib 1.2.13 h5eee18b_0
zope 1.0 py37_1
zope.event 4.5.0 py37_0
zope.interface 5.4.0 py37h7f8727e_0
I had a similar error; I needed to add the $live collection to config.yaml.
(It's mentioned in the error message that it's trying to hit http://localhost:40369/live/)