courlan
courlan copied to clipboard
Support for custom user agents in is_live_page()
Hi!
We're currently using courlan via trafilatura for some crawling and found that when trying to do liveness checks for a hosts url we're being blocked due to user agent headers, however, we're unable to change them. I noticed there's some commented out code in the redirection test which the is_live_page uses that references user agent headers.
Is there any interest in supporting changing the headers or having a different one set?
Thanks.
Hi @drFerg, definitely, Trafilatura supports custom user-agent settings, courlan could also do so. The config file approach could be replicated here.
Are you interested in drafting a pull request?