grab-site icon indicating copy to clipboard operation
grab-site copied to clipboard

Crashes with AssertionError: assert url_item.is_processed

Open ethus3h opened this issue 9 years ago • 2 comments

Hello! I'm having grab-site crash on a job. It is the same issue and same problematic job, I think, as https://github.com/chfoo/wpull/issues/293. It happens in grab-site, but at a different URL than it did in ArchiveBot. I'm just posting this here since maybe you know how to fix it, even though it seems to be upstream with wpull? Thanks :)

https://www.reddit.com/r/usenet/comments/22pn9m/i_am_in_the_us_using_blocknews_does_it_make_sense/%20--ignore-sets=reddit ...
ERROR Fetching ‘http://www.change.org/p/mozilla-don-t-remove-xul-and-xpcom-support-from-add-ons’ encountered an error: DNS resolve timed out.
https://www.reddit.com/r/usenet/comments/22pn9m/i_am_in_the_us_using_blocknews_does_it_make_sense ...
ERROR Fatal exception.
Traceback (most recent call last):
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/app.py", line 128, in run
    yield From(self._builder.factory['Engine']())
  File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 250, in _step
    result = coro.throw(exc)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 278, in __call__
    yield From(self._run_workers())
  File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 252, in _step
    result = coro.send(value)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 67, in _run_workers
    task.result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/futures.py$, line 286, in result
    raise self._exception
  File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 250, in _step
    result = coro.throw(exc)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 146, in _run_worker
    yield From(self._process_item(item))
  File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 250, in _step
    result = coro.throw(exc)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 327, in _process_item
    yield From(self._process_url_item(url_record))
  File "/home/archivebot/.local/lib/python3.4/site-packages/trollius/tasks.py", line 252, in _step
    result = coro.send(value)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 386, in _process_url_item
    assert url_item.is_processed
AssertionError
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.

Finished grab 18a7291ed98433442b0cb9e039bebc92 https://archive.org/download/20151122n10Converted.urlsV4/%282015-11-22n10%29%20converted.urls%20v4.txt with exit code 1
Output is in directory:
/home/archivebot/warcdealer/grabbed/archive.org-download-20151122n10Converted.urlsV4-%282015-11-22n10%29%20converted.urls%20v4.txt-2015-11-25-18a7291e
archivebot@sd-86322:~/warcdealer/grabbed$ 

ethus3h avatar Nov 25 '15 19:11 ethus3h

ArchiveBot issue is https://github.com/chfoo/wpull/issues/293

ethus3h avatar Nov 25 '15 19:11 ethus3h

https://github.com/chfoo/wpull/issues/293#issuecomment-343675653 appears to be correct: this crash happens when giving grab-site an unsupported URL scheme in the command line, URL list, or accept_url hook.

ivan avatar Sep 06 '18 10:09 ivan