Getting `404 not found` errors for missed blocks instead of a specific missed error code
Description
When I fetch the slot data of a missed slot with eth/v2/beacon/blocks and eth/v1/beacon/headers, the response is 404 Not Found, instead of a specific error indicating this is a missed block, which makes it difficult to differentiate the following 2 cases:
- Cannot fetch slot data because it's a missed slot.
- Cannot fetch slot data because nodes are out of sync, or the slot hasn't been proposed yet.
Wondering should client return a specific error for missed slot?
Version
Lighthouse/v4.5.0-441fc16/x86_64-linux
Present Behaviour
When fetching the slot data of a missed slot with eth/v2/beacon/blocks and eth/v1/beacon/headers, the response is 404 Not Found.
For example, when fetch block header for https://holesky.beaconcha.in/slot/1, the response is
{ "code": 404, "message": "NOT_FOUND: beacon block at slot 1", "stacktraces": [] }
Expected Behaviour
Should client return a specific error/error code for missed slots?
Steps to resolve
Please describe the steps required to resolve this issue, if known.
I agree this can be a bit confusing, but the spec mandates a 404 in the case where a block is skipped
Some options:
- Return a 400 for slots in the future
- Return a 400 for slots prior to the database's backfill limit (
oldest_block_slot)
IMO, as long as we can differentiate skipped blocks with other not_found cases should be fine, we can either return with a specific error code or even keep 404 but with a specific error message.
I discussed this with the other beacon-APIs maintainers and we didn't yet reach a conclusion on how to proceed, although we agreed that something should be done.
Jacek (@arnetheduck) from Nimbus pointed out that it would be good to be able to differentiate these cases by status code:
- I don't know this route - give up on this server
- I don't know about this slot - ask someone else or later
- I know this slot doesn't have a block - move on to the next slot
At the moment they are all 404s.
The problem is we couldn't agree on which status codes to use, as none from the HTTP spec are quite right. E.g.
- Route unknown: 405 Method Not Allowed (not quite right, because we don't accept any method on the route, so this should really be a 404)
- Slot unknown: 410 Gone (not quite right, because 410 usually implies permanent loss and the absence of old blocks & states can be remedied by backfilling them)
- Slot does not have a block: 404 (fine)
There was also a desire by @rolfyone to avoid 5xx status codes, which is fair, seeing as they usually indicate bugs. That said, I think 503 Service Unavailable could be useful for the "I don't know about this slot, try again later" because its descriptions matches quite closely:
The HyperText Transfer Protocol (HTTP) 503 Service Unavailable server error response code indicates that the server is not ready to handle the request.
Common causes are a server that is down for maintenance or that is overloaded. This response should be used for temporary conditions and the Retry-After HTTP header should, if possible, contain the estimated time for the recovery of the service.
(emphasis added)
As a middle-ground I'm now proposing:
- Route unknown: 404
- Slot unknown/unavailable: 503
- Slot known to be empty: 404
This still combines two cases with a 404 status, but is still an improvement over the current state of things. The argument would be that callers should be able to rule out the non-existence of the route by checking their software version or by reading the error message returned (assuming they can get at it under layers of reverse proxies and similar).
503 wfm, great if you can get that upstreamed ;)
for "slot known to be empty" I actually think 200 is the right code - with an explicit response that indicates an empty slot.
Ooh, a 200 with an empty body and a header that indicates it was intentional? 🤔 Thinking of something that works for JSON & SSZ, but maybe for JSON a response like this would be better:
{
"data": null,
}
out of band we discussed the potential application of 204 with the metadata in headers, which may be appropriate... just transferring this part of that conversation, because there were a lot of ideas but this one seemed to have a level of consensus...
So really
- there's keywords:
head,finalized,genesisetc - at genesis, finalized / head would still be set according to spec - theres roots: having 404 not found for a root that doesn't exist i could see as reasonable?
- there's slots: this is the scenario we're likely to know that there's a slot N which is empty and will never return a block. In this instance having a 204 for a slot that is before head and contained no block would make sense.
It's still reasonable to think that something beyond the head slot may actually result in an error (such as 400) because it's a future slot and just basically a bad request... Anyway, I figured my initial response was probably worth giving a little more of a breakdown to.