[Bug] Flaky DeploymentNotFound Error when deploying subgraphs
Bug report
Running a self hosted graph-node on v0.31.0.
Noticed a flaky siutation where sometimes the subgraph deployment would fail due to the DeploymentNotFound error. Even though graph-cli reports an error, the subgraph seems to start indexing properly.
Subsequent deploys of the same IPFS hash will result in a different error: duplicate key value violates unique constraint "subgraph_deployment_id_key".
Traced down into the code to see that it is likely coming from create_deployment_internal.
set_on_sync called by create_deployment is one potential spot, but checking the subgraph_manifest table shows no missing rows.
create_subgraph_version is another possible spot, but the subgraph_deployment table looks fine as well.
Perhaps these tables were not populated correctly at time of the error due to a race condition or database issue? Looking at metrics, the database was not under a lot of load at the time.
Would appreciate any second eyes on this, as it is a flaky error that has not been successfully reproduced yet while debugging.
Relevant log output
# Log from graph-node on first deployment
Jul 11 16:44:26.941 ERRO subgraph_deploy failed, params: SubgraphDeployParams { name: SubgraphName("c47020ff-bcc6-44f4-8296-9b0ec6cb783b"), ipfs_hash: DeploymentHash("QmYSk536LvM4fqqqCVDzjCqsNochPcnwPGXFkWbUCHzLBP"), node_id: None, debug_fork: None, history_blocks: None }, error: SubgraphDeploymentError(DeploymentNotFound("QmYSk536LvM4fqqqCVDzjCqsNochPcnwPGXFkWbUCHzLBP")), component: JsonRpcServer
# Log from graph-node on subsequent deployment
Jul 12 10:14:25.099 ERRO subgraph_deploy failed, params: SubgraphDeployParams { name: SubgraphName("d94932a4-daca-4443-84dd-fddc1169cab7"), ipfs_hash: DeploymentHash("QmYSk536LvM4fqqqCVDzjCqsNochPcnwPGXFkWbUCHzLBP"), node_id: None, debug_fork: None, history_blocks: None }, error: SubgraphDeploymentError(Unknown(duplicate key value violates unique constraint "subgraph_deployment_id_key")), component: JsonRpcServer
IPFS hash
No response
Subgraph name or link to explorer
No response
Some information to help us out
- [ ] Tick this box if this bug is caused by a regression found in the latest release.
- [ ] Tick this box if this bug is specific to the hosted service.
- [X] I have searched the issue tracker to make sure this issue is not a duplicate.
OS information
Linux
Hey @kevin-satsuma can you elaborate when does this DeploymentNotFound error occur? Is it random?
Sure! The DeploymentNotFound error seems to be random so far. When the error occurs, resource usage on the indexer and database both look normal as well. This has made it difficult to investigate without the ability to reproduce the error on demand.
hey @kevin-satsuma is this running in combined mode, with a single database (i.e. not using Graph Node sharding)?
@azf20 The graph-node instance showing this error is running with default node_role of combined-node, but we do not send any queries to it. It uses a single database and does not use Graph Node sharding.
Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.