adstxtcrawler icon indicating copy to clipboard operation
adstxtcrawler copied to clipboard

Accept: text/plain; charset=UTF-8

Open wrmike1 opened this issue 3 years ago • 0 comments

If a server declares the character set for the text file with "text/plain; charset=UTF-8" (as it should), adstxtcrawler gets an HTTP 406 (Not acceptable) response, instead of downloading the ads.txt file. This seems to be due to the fact that adstxtcrawler only accepts 'text/plain' and nothing else.

    myheaders = {
        'User-Agent':
        'AdsTxtCrawler/1.0; +https://github.com/InteractiveAdvertisingBureau/adstxtcrawler',
        'Accept':
        'text/plain',
    }

I think one of these will fix the issue:

'text/plain,*/*',
'text/plain,text/plain; charset=UTF-8',

My web browser, for instance has:

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8

It accepts anything due to /.

I can easily set up a server for you to test this with if you want.

wrmike1 avatar Apr 17 '21 14:04 wrmike1