govuk_crawler_worker
govuk_crawler_worker copied to clipboard
Crawler errors when crawling subdirectories of JSON output
On GOV.UK we have URLs like this:
- https://www.gov.uk/api/content/raib-reports
- https://www.gov.uk/api/content/browse
- https://www.gov.uk/api/content/browse/justice
@alext has commented elsewhere:
This is due to a limitation of the crawler. It ends up trying to create both a file and a directory with the same name. For html content it creates the file with a .html extension. It doesn't add an extension for json content, hence the clash.