courlan icon indicating copy to clipboard operation
courlan copied to clipboard

Support for custom user agents in is_live_page()

Open drFerg opened this issue 1 year ago • 1 comments

Hi!

We're currently using courlan via trafilatura for some crawling and found that when trying to do liveness checks for a hosts url we're being blocked due to user agent headers, however, we're unable to change them. I noticed there's some commented out code in the redirection test which the is_live_page uses that references user agent headers.

Is there any interest in supporting changing the headers or having a different one set?

Thanks.

drFerg avatar Aug 29 '24 14:08 drFerg

Hi @drFerg, definitely, Trafilatura supports custom user-agent settings, courlan could also do so. The config file approach could be replicated here.

Are you interested in drafting a pull request?

adbar avatar Aug 29 '24 16:08 adbar