govuk_crawler_worker icon indicating copy to clipboard operation
govuk_crawler_worker copied to clipboard

Crawler errors when crawling subdirectories of JSON output

Open alexmuller opened this issue 9 years ago • 0 comments

On GOV.UK we have URLs like this:

  • https://www.gov.uk/api/content/raib-reports
  • https://www.gov.uk/api/content/browse
  • https://www.gov.uk/api/content/browse/justice

@alext has commented elsewhere:

This is due to a limitation of the crawler. It ends up trying to create both a file and a directory with the same name. For html content it creates the file with a .html extension. It doesn't add an extension for json content, hence the clash.

alexmuller avatar Oct 12 '15 13:10 alexmuller