cardano-graphql
cardano-graphql copied to clipboard
cardano-graphql fatal errors don't fully fail the systemd service unit
Summary
While running the cardano-graphql NixOS service as a systemd unit, at least in some cases, when an error is logged, the error appears to result in a fatal condition where the cardano-graphql process no longer continues to function as no further activity happens in that systemd unit as would normally. However, the systemd process doesn't die, systemd therefore doesn't restart the service even though it's non-functional and the cardano-graphql process ends up blocking until manual intervention occurs.
This might be due to an exception which occurs in cardano-graphql that is eventually caught here, then the cardano-graphql process stops, but node continue running and systemd believes the service is still running.
If so, logging a message that the server is exiting due to an exception after logging the error would be helpful. Logging the request associated with the exception would also be helpful.
Steps to reproduce the bug
Run cardano-graphql in an explorer stack under load. Example problem which randomly occurs -- watch hasura client initialize, then see an error thrown (happens randomly) after which point no further activity will happen in the process.
cardano-graphql-start[$PID]: {"name":"cardano-graphql","hostname":"explorer-a","pid":$PID,"level":30,"module":"HasuraClient","msg":"Initializing","time":"$TIMESTAMP","v":0}
cardano-graphql-start[$PID]: {"name":"cardano-graphql","hostname":"explorer-c","pid":$PID,"level":50,"msg":"database query error: {\"response\":{\"errors\":[{\"extensions\":{\"code\":\"unexpected\",\"path\":\"$\"},\"message\":\"database query error\"}],\"status\":200},\"request\":{\"query\":\"query {\\n epochs (limit: 1, order_by: { number: desc }) {\\n adaPots {\\n reserves\\n }\\n }\\n rewards_aggregate {\\n aggregate {\\n sum {\\n amount\\n }\\n }\\n }\\n utxos_aggregate {\\n aggregate {\\n sum {\\n value\\n }\\n }\\n }\\n withdrawals_aggregate {\\n aggregate {\\n sum {\\n amount\\n }\\n }\\n }\\n }\"}}","time":"$TIMESTAMP","v":0}
Actual Result
- Manual intervention required to restart a non-functional cardano-graphql service
Expected Result
- A fatal error which renders the cardano-graphql process non-functional to completely exit with a failure code so that systemd recognizes a unit failure and will take predetermined action.
Environment
Cardano-graphql 7.0.X and newer unreleased test branches
Platform
- [ ] Linux (Ubuntu)
- [X] Linux (Other)
- [ ] macOS
- [ ] Windows
Platform version
NixOS 21.11
Runtime
- [X] Node.js
- [ ] Docker
Runtime version
v12.15.0