Ivan Kozik
Ivan Kozik
That would be useful but I have no idea how to do this. It may require a new wpull hook.
https://gist.github.com/JustAnotherArchivist/b82f7848e3c14eaf7717b9bd3ff8321a
https://github.com/chfoo/wpull/issues/293#issuecomment-343675653 appears to be correct: this crash happens when giving grab-site an unsupported URL scheme in the command line, URL list, or `accept_url` hook.
This should be filed on wpull. The FTP URL would be helpful as well.
I believe this PR does not make grab-site work with Python 3.10 because we still have incompatibilities preventing an upgrade to Python 3.9 (https://github.com/ArchiveTeam/ludios_wpull/issues/20)
oh, I see you made changes to wpull in your fork. I guess I would need Python 3.10 support to be upstream in ArchiveTeam/ludios_wpull (or ArchiveTeam/wpull if the useful ludios_wpull...
I don't even know if this is possible to implement nicely (i.e. not breaking any existing responses being downloaded) with the wpull hooks that exist now
Thanks for the report. I have confirmed the first issue so far. You are right that the tumblr igset is unhelpfully ignoring non-tumblr domains, including the `t.umblr.com` redirector. I'll see...
I confirmed that problems 1.) and 3.) are fixed in ca8fd22c02885e8e3dfce20b609daaf1dae68e48 (or, for 1, at least t.umblr.com is no longer ignored). Can you please check if 2.) is fixed, or...
I tried with the URLs you gave for 1 and 3, then used `gs-dump-urls` on `wpull.db` to check whether wpull grabbed them. Can you try with `--igon` to confirm that...