Adding support for HTTP Basic Auth
You can use the --httpBasicAuth flag to pass colon separated username and password credentials to the browser used for crawling.
For some reason I couldn't get page.authenticate() to work as advertised in the puppeteer docs. I could get it to work outside of browsertrix-crawler's browser, so maybe there's some interference from another setting? But setting the Authenticate header directly worked nicely. Thanks for the tip @vnznznz!
Fixes #168
Hm, it's probably because we are also overriding Fetch.requestPaused, but not Fetch.authRequired for recording, though they're two separate events there. The downside of this approach is that it sends the auth header always with any request. Is that a security risk potentially? I think that will probably send it to third party URLs also.
I wonder if the auth should be associated with a particular seed, and then only sent to requests to the seed domain. That'd be a bit more work, though, and slight refactoring of seeds... I suppose you would probably never have two seeds with two different HTTP auth passwords?
I was wondering about that too. I guess it begs the question of how it would show up in the browsertrix user interface?
I don't think it would be easy to have url specific auth from the command line? But certainly from YAML configuration it would work?
Maybe having the blunt command line version AND the ability to fine tune in the YAML config would be good?
seeds:
- url: https://webrecorder.net/
depth: 1
scopeType: "prefix"
httpBasicAuth: "alice:abc123"
@edsu Can you try #616, I believe this should do what you want, while keeping the http-auth per seed, which allows for it to be specified directly as https://user:[email protected]/ (and should work with Browsertrix app as is)
Closing in favor of #616 - I suppose could add CLI flags as well to that PR, but that introduces more risk of leaking credentials with multiple seeds, and requires more work to integrate into Browsertrix app and so is less flexible
Thanks for pushing this forward. It makes sense why you would want to have the auth attached to the seed.