grab-site icon indicating copy to clipboard operation
grab-site copied to clipboard

Multiple --wpull-args options don't seem to be respected

Open ethus3h opened this issue 8 years ago • 5 comments

When using this:

a() { cd /home/grabbot/grabs/ && grab-site --no-dupespotter --concurrency=5 --wpull-args=--warc-move=/home/grabbot/warcdealer/\ --phantomjs-scroll=50000\ --phantomjs-exe=/phantomjs-1.9.8-linux-x86_64/bin/phantomjs\ --content-on-error "$@"; }

Doing this:

a http://fanzub.com/ --concurrency=1 --delay=3000-10000 --wpull-args="--retry-connrefused --retry-dns-error --tries=1000"

doesn't seem to respect the --content-on-error argument.

Is this intended behavior? Thanks!

ethus3h avatar Feb 26 '16 01:02 ethus3h

Indeed, it takes only the last --wpull-args. I'll leave this open until I figure out whether they can/should be combined if used multiple times.

ivan avatar Feb 26 '16 02:02 ivan

Are --retry-connrefused --retry-dns-error something that grab-site should have on by default?

ivan avatar Feb 26 '16 02:02 ivan

Yes please!

rwoodpecker avatar Feb 26 '16 04:02 rwoodpecker

Regarding --retry-connrefused --retry-dns-error: Not sure; if a user wants them, the user can just add them. How hard is it to remove arguments that are there by default?

I'd like to have something like:

grab-site --wpull-args="--foo=1 --bar --baz=qux" http://example.org --remove-wpull-args="--baz" --append-wpull-args="--foo=2 --blah"

and have it run like:

grab-site --wpull-args="--foo=2 --bar --blah" http://example.org

Probably to reserve backward compatibility, the current behavior of having only the final --wpull-args option respected should be retained.

ethus3h avatar Feb 26 '16 04:02 ethus3h

FYI, according to the click docs here: Sometimes, you have options that take more than one argument. For options, only a fixed number of arguments is supported.

However, combining is an option with http://click.pocoo.org/6/options/#multiple-options and that would allow you to specify them multiple times.

As for the other question, --retry-dns-error is a "yes" for me because it is a broad category that covers many things, including transient errors. --retry-connrefused is a "no" as it is much narrower and could get the unwary in trouble for repeatedly connecting to a server after being banned.

12As avatar Mar 08 '16 16:03 12As