[all] Error Refactor
Describe the problem Our errors have a few issues:
- They often have several nested contexts and can be very difficult to parse through
- It is not clear which ones are retry-able (message dropped, timeout, etc) and which ones you should not retry
- They can blow up a console, and often times take one of us to sort through and extract the useful information
The goal
Errors should:
- Be informative, but not swamp you with a deluge of information that may or may not be helpful
- Not have nested contexts, all of that can go in one place
- Have clear codes, and if possible, resolution steps
- Be unique, and clearly traceable to the part of the stack they originate in (
InboundvOutboundin protocol.. bit of an oof dealing with those)
Some inspiration
- Hardhat errors
- Advice from the graph team:
We've started using error codes and linking to a GitHub page in the indexer logs, to give people a more in-depth explanation of every error they might see: https://github.com/graphprotocol/indexer/blob/master/docs/errors.md
Every log message includes an "errorUrl": ... kind of field pointing to the specific error explanation.
However, as you can see, we've hardly managed to fill out the descriptions so far. :wink: The error definitions are here: https://github.com/graphprotocol/indexer/blob/ab34268709111ed013cac9f0a65d2cc9745428cd/packages/indexer-common/src/errors.ts
The way we wrap errors in indexer errors is: https://github.com/graphprotocol/indexer/blob/deedd11a42a7d7b02bec3e1b9fe5b8844a3c0ac8/packages/indexer-agent/src/network.ts#L342-L346
One thing that's been useful to us and our indexers are automatically tracked error metrics: https://github.com/graphprotocol/indexer/blob/ab34268709111ed013cac9f0a65d2cc9745428cd/packages/indexer-common/src/errors.ts#L104-L106
That way everyone can track how often they are seeing which errors. In the testnet we could see every individual's error numbers and help specific people out with their issues.