graph-node
graph-node copied to clipboard
[Bug] Impossible combination of entity operations
Bug report
A subgraph that was running without issues on v0.34.0 suddenly started failing in v0.35.0. This subgraph is deployed across many networks and they all started failing with this issue which suggests that this is a regression in v0.35.0.
Relevant log output
May 28 13:26:21.945 ERRO Subgraph failed with non-deterministic error: Failed to transact block operations: internal constraint violated: impossible combination of entity operations: Remove { key: EntityKey(SplitRecipient[0x7c29ca34b44d388ab031ecce7781f2420e1e5c99-0xfa9aad02ffede509520e27ef329ee28871a76828-5], cr=0), block: 15264023 } and then Remove { key: EntityKey(SplitRecipient[0x7c29ca34b44d388ab031ecce7781f2420e1e5c99-0xfa9aad02ffede509520e27ef329ee28871a76828-5], cr=0), block: 15267303 }, retry_delay_s: 108, attempt: 0, sgd: 1, subgraph_id: QmcpChELh7eJShPHvG5zLBUYBsBQby9KZ8roh7BrT2Yp5B, component: SubgraphInstanceManager
IPFS hash
QmcpChELh7eJShPHvG5zLBUYBsBQby9KZ8roh7BrT2Yp5B
Subgraph name or link to explorer
No response
Some information to help us out
- [X] Tick this box if this bug is caused by a regression found in the latest release.
- [ ] Tick this box if this bug is specific to the hosted service.
- [X] I have searched the issue tracker to make sure this issue is not a duplicate.
OS information
Linux
we also see Failed to transact block operations: internal constraint violated: Batches must go forward. Can't append a batch with block pointer #114200817 as another issue happening on these subgraphs but this one happens less reliably.
Seems like this issue might be related to batching. Trying to bisect and the issue doesn't seem to happen reliably on any commits. Thought I bisected down to 31943fc706c84e8afe4a3677b7cf172339d72461 but then I went to previous commit to test (and didn't find issues). Changed back to 31943fc706c84e8afe4a3677b7cf172339d72461 and now the issue isn't happening. Very unusual. It also doesn't make sense that this would be the offending commit.
Now I'm thinking this might be a subgraph bug that wasn't revealed until we upgraded to v0.35.0
Setting GRAPH_STORE_WRITE_BATCH_SIZE=0 seems to resolve the issue
The only commits I see between 0.34.0 and 0.35.0 related to batching are for enabling/disabling batching based on whether the subgraph is caught up and in my local testing the subgraph is in the process of catching up so batching is definitely enabled. Did any other batching changes happen between these two releases? cc @leoyvens @lutter
Alternatively, could there be some changes to the logic that affect loading entities? The subgraph in question has a flow like:
- parse list of addresses in event
- load the relevant entity
- for any addresses that existed in the entity before but do not exist in the current event, use
store.removeto remove them - save the entity with the latest list of addresses
Since we see two remove modifications here, could it be that something is going on with the step 4 (not properly saving before committing) or step 2 (not properly loading during a batch)?
Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.
I seem to have run into this same issue. With subgraph IPFS hash QmfA2FrsjAz5EEK5NuyqDGhE33Umnhbzbc1YEeVVDb6TgL, I get:
Error: Failed to transact block operations: internal constraint violated: impossible combination of entity operations: Remove { key: EntityKey(PackShareContent[0x575700002a090000], cr=0), block: 67096196 } and then Remove { key: EntityKey(PackShareContent[0x575700002a090000], cr=0), block: 67100212 }
What the subgraph does is similar to the OP's: For handlers of certain events, it will iterate through a list of entities (that are linked to another entity via 1:n relation and a derived field), removing them from the store. In the first of the two blocks in question (from the error), an event is fired that will trigger a "full refresh" that takes some time to process. In the second of the two blocks, an event is fired that refreshes just a select few entities. The error sounds like the indexer is trying to process them in parallel, and thus a duplicate remove is attempted. Or the second event's handler somehow does not correctly see that the entities have already been removed previously / an update is in progress for them.
@domob1812 Sorrry for the long radio silence, just coming back to that: can you confirm that in your case the subgraph removes entities without first checking if they exist? That should be ok, I just want to make sure that that's the case here. I can then change graph-node to allow that.
@lutter No worries, I have updated my subgraph to not use immutable entities just in case and it is working fine.
The subgraph uses store.remove to remove entities which are been enumerated by a derived entity (using store.loadRelated in the code generated from the schema).
I just opened a PR that will fix this problem. Once that's out and deployed, you should be able to go back to using immutable entities as that will be much faster for queries