via icon indicating copy to clipboard operation
via copied to clipboard

Return 404 instead of 400 responses for obviously-invalid URLs

Open robertknight opened this issue 1 year ago • 1 comments

Requests for "obviously invalid" URLs like https://via.hypothes.is/wp-admin return 400 responses instead of 404. This is inconvenient because we cannot easily filter out such responses in eg. New Relic metrics which monitor the overall error rate of the service.

We have encountered situations when a bot hits a large number of URLs like this in a short window of time, typically looking for vulnerabilities in common PHP packages. This triggered an alarm that fires when 80%+ of the service's requests are failing for a period of time (10-15 minutes).

The reason for the 400 here is that /wp-admin matches the general route for proxying websites which treats the part after the initial / as a URL, where the protocol is optional. CheckmateClient.check_url fails to parse wp-admin as a public URL and raises BadURL, which results in a 400 response.

For context, see https://hypothes-is.slack.com/archives/C074BUPEG/p1728300410941439?thread_ts=1728292002.576029&cid=C074BUPEG.

New Relic alert: https://one.newrelic.com/alerts/issue?account=1385283&duration=259200000&state=e0b2c426-026d-27ee-4aa8-b0894fb965d1

robertknight avatar Oct 07 '24 11:10 robertknight

Some other options:

  • Modify alert conditions to exclude all errors of type BadURL
  • Modify alert conditions to exclude errors with status 400. This might be a problem as the 400 status is a general bad request status that is potentially used in other contexts and we do want to be notified if the volume increases

An advantage of making these requests return a 404 in Via is that it matches how other services would respond to the same scenario, where eg. /wp-admin would not match any routes.

robertknight avatar Oct 07 '24 12:10 robertknight