govuk_crawler_worker Crawler errors when crawling subdirectories of JSON output

Crawler errors when crawling subdirectories of JSON output

Open alexmuller opened this issue 9 years ago • 0 comments

On GOV.UK we have URLs like this:

https://www.gov.uk/api/content/raib-reports
https://www.gov.uk/api/content/browse
https://www.gov.uk/api/content/browse/justice

@alext has commented elsewhere:

This is due to a limitation of the crawler. It ends up trying to create both a file and a directory with the same name. For html content it creates the file with a .html extension. It doesn't add an extension for json content, hence the clash.

Oct 12 '15 13:10 alexmuller

govuk_crawler_worker govuk_crawler_worker copied to clipboard

Crawler errors when crawling subdirectories of JSON output

govuk_crawler_worker
govuk_crawler_worker copied to clipboard