grab-site
grab-site copied to clipboard
Crashes with AssertionError: assert url_item.is_processed
Hello! I'm having grab-site crash on a job. It is the same issue and same problematic job, I think, as https://github.com/chfoo/wpull/issues/293. It happens in grab-site, but at a different URL than it did in ArchiveBot. I'm just posting this here since maybe you know how to fix it, even though it seems to be upstream with wpull? Thanks :)
https://www.reddit.com/r/usenet/comments/22pn9m/i_am_in_the_us_using_blocknews_does_it_make_sense/%20--ignore-sets=reddit ...
ERROR Fetching ‘http://www.change.org/p/mozilla-don-t-remove-xul-and-xpcom-support-from-add-ons’ encountered an error: DNS resolve timed out.
https://www.reddit.com/r/usenet/comments/22pn9m/i_am_in_the_us_using_blocknews_does_it_make_sense ...
ERROR Fatal exception.
Traceback (most recent call last):
File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/app.py", line 128, in run
yield From(self._builder.factory['Engine']())
File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 250, in _step
result = coro.throw(exc)
File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 278, in __call__
yield From(self._run_workers())
File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 252, in _step
result = coro.send(value)
File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 67, in _run_workers
task.result()
File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/futures.py$, line 286, in result
raise self._exception
File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 250, in _step
result = coro.throw(exc)
File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 146, in _run_worker
yield From(self._process_item(item))
File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 250, in _step
result = coro.throw(exc)
File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 327, in _process_item
yield From(self._process_url_item(url_record))
File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 252, in _step
result = coro.send(value)
File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 386, in _process_url_item
assert url_item.is_processed
AssertionError
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
Finished grab 18a7291ed98433442b0cb9e039bebc92 https://archive.org/download/20151122n10Converted.urlsV4/%282015-11-22n10%29%20converted.urls%20v4.txt with exit code 1
Output is in directory:
/home/archivebot/warcdealer/grabbed/archive.org-download-20151122n10Converted.urlsV4-%282015-11-22n10%29%20converted.urls%20v4.txt-2015-11-25-18a7291e
archivebot@sd-86322:~/warcdealer/grabbed$
ArchiveBot issue is https://github.com/chfoo/wpull/issues/293
https://github.com/chfoo/wpull/issues/293#issuecomment-343675653 appears to be correct: this crash happens when giving grab-site an unsupported URL scheme in the command line, URL list, or accept_url
hook.