grab-site
grab-site copied to clipboard
wpull crash when http_proxy is set
grab-site suddenly stop workings and no long work since then.
I've tried uninstall-then-reinstall wpull and grab-site. But still not working.
Cry for help, please!
Traceback (most recent call last):
File "/Users/aaa/gs-venv/lib/python3.7/site-packages/wpull/application/app.py", line 157, in run
yield from pipeline.process()
File "/Users/aaa/gs-venv/lib/python3.7/site-packages/wpull/pipeline/pipeline.py", line 194, in process
yield from self._process_one_worker()
File "/Users/aaa/gs-venv/lib/python3.7/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
task.result()
File "/Users/aaa/gs-venv/lib/python3.7/site-packages/wpull/pipeline/pipeline.py", line 119, in process
item = yield from self.process_one(_worker_id=worker_id)
File "/Users/aaa/gs-venv/lib/python3.7/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
yield from task.process(item)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/coroutines.py", line 120, in coro
res = func(*args, **kw)
File "/Users/aaa/gs-venv/lib/python3.7/site-packages/wpull/application/tasks/network.py", line 21, in process
self._build_connection_pool(session)
File "/Users/aaa/gs-venv/lib/python3.7/site-packages/wpull/application/tasks/network.py", line 85, in _build_connection_pool
http_proxy = session.args.http_proxy.split(':', 1)
AttributeError: 'NoneType' object has no attribute 'split'
CRITICAL Sorry, Wpull unexpectedly crashed.
Disconnected from ws:// server: RuntimeError('Event loop is closed')
Exception ignored in: <coroutine object sender at 0x10e9d7ac8>
Traceback (most recent call last):
File "/Users/aaa/gs-venv/lib/python3.7/site-packages/libgrabsite/dashboard_client.py", line 54, in sender
await asyncio.sleep(delay)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/tasks.py", line 566, in sleep
future, result)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 657, in call_later
context=context)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 667, in call_at
self._check_closed()
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 480, in _check_closed
raise RuntimeError('Event loop is closed')
That http_proxy = session.args.http_proxy.split(':', 1)
makes me think something set an environmental variable to use an HTTP proxy.
Try env | grep -i proxy
and maybe unset the variable?
Please let me know if it's not that.
I'm receiving an identical error after setting wpull's proxy using --wpull-args="--http-proxy=0.0.0.0:16379"
Unfortunately env | grep -i proxy
doesn't seem to return anything, and I've even made sure to run it within the container that grab-site is running in.
~~Even after removing --wpull-args
, grab-site seems to be crashing with the same event loop error when attempting to crawl.~~ In my case I was able to reinstall grab-site to fix this. I've even switched to dockerized grab-site, to make it easier to spin up fresh environments for testing.
As I'd like to eventually bring full onion archive capabilities to grab-site, I have decided to go ahead and make sure my wget onion archive configuration is able to be ported to wpull first.
I've opened an issue to address my personal issues using proxies in wpull. Assuming I can get that stuff cleared up, I will take another look at the proxy issues we're receiving in grab-site. grab-site appears to run a fork of wpull, so I'm wondering if our proxy issue may be specific to the fork of wpull or the plugins that grab-site introduces.