ArchiveBot
ArchiveBot copied to clipboard
Send `Cookie: over18=1` to all reddit URLs
On all requests that match https?://[^/]+\.reddit\.com(/|$)
, we should send a Cookie: over18=1
header so that we always get the content instead of the age wall.
Perhaps something for the new pipeline/archivebot/wpull/plugin.py
?
Implementation note: pipeline.py
(15ae3ca6a6831f2b1ae366a58d5620474f5b3d2c) already adds this cookie for the top level URL.
Yeah, it would be better if this worked for any crawl that includes a reddit URL, not just those that start with a reddit URL.
Similarly, send Cookie: NCR=1
to all *.blogspot.com URLs
Basically, we should create a cookie jar for these and use wpull's --load-cookies
option.
Also _options=%7B%22pref_quarantine_optin%22%3A%20true%7D
on Reddit to get around the quarantine blocks.
Related: #416
Some FC2 blogs are age-gated and require an age_check=1
cookie. Should be sent to all blog\d*\.fc2\.com
and blog\d*\.fc2blog\.us
subdomains; it's set for the particular blog shard(?) domain when you click on the corresponding button on the age gate.