graph-node icon indicating copy to clipboard operation
graph-node copied to clipboard

Large inserts fail

Open lutter opened this issue 4 years ago • 5 comments

With the changes from #2223, we can hit the Postgres limit on the number of bind variables that can be used in one statement. That limit is 65535 parameters. We need to change the code that generates the insert/update/delete statements to break such large changes into multiple statements.

lutter avatar Apr 01 '21 16:04 lutter

This happened for QmVCfXsbGGmf73GJxTsWZhtjUunitaUXsoJQH6kMYqQoRP:

name      | pancakeswap/exchange-lite
status    | pending
id        | QmVCfXsbGGmf73GJxTsWZhtjUunitaUXsoJQH6kMYqQoRP
namespace | sgd66151
shard     | evm_compat
chain     | bsc
node_id   | index_node_community_4

Error message:

31/03/2021, 10:15:06 - Subgraph instance failed to run: Error while processing block stream for a subgraph: store error: number of parameters must be between 0 and 65535 , code: SubgraphSyncingFailure, id: QmVCfXsbGGmf73GJxTsWZhtjUunitaUXsoJQH6kMYqQoRP

lutter avatar Apr 02 '21 17:04 lutter

@lutter just had this error too in juanmardefago/loopring36 -> QmWdU7R2QyTV71wZ8z1R8d75bWyLWz4jHLXeoWnzah27zT

juanmardefago avatar Apr 03 '21 20:04 juanmardefago

A Discord user reported that this error showed up in 0.25.2. The error message was:

Error while processing block stream for a subgraph: store error: number of parameters must be between 0 and 65535    

Here we define the chunk size and add one extra slot for the block_range column, but I'm pretty sure we bind other columns as well, like vid.

The fix I'm imagining would involve adding +2 or +3 to the denominator part of that division to account for all the extra bindings.

tilacog avatar Mar 30 '22 15:03 tilacog

@tilacog is this issue still live?

azf20 avatar Sep 16 '22 22:09 azf20

@azf20 I believe it is. Although, I haven't heard from other users about it since it was reopened.

tilacog avatar Sep 16 '22 22:09 tilacog

Hi @tilacog, I'm still face the same issue, Are there any potential solutions?

hangleang avatar Dec 03 '22 04:12 hangleang

@hangleang is this a subgraph in the hosted service? If so, which one?

lutter avatar Dec 05 '22 23:12 lutter

Hi @lutter, yes, it is. the problem got fixed after I remove fullTextSearch from subgraph, seems like it cause the problem

hangleang avatar Dec 07 '22 10:12 hangleang

@lutter just ran into the same issue for the Geo subgraph

Hosted Service: https://thegraph.com/hosted-service/subgraph/makeitrein/geo IPFS URL: QmTddMMUXw7tsnQzEY1BmR83xwqbCQRWwRXdP7vXZWi1kW Subgraph Code: https://github.com/geobrowser/geogenesis/tree/full-text-entity-search/packages/subgraph Error: https://cloud.hasura.io/public/graphiql?endpoint=https%3A%2F%2Fapi.thegraph.com%2Findex-node%2Fgraphql&query=%7B%0A++indexingStatuses%28subgraphs%3A+%5B%22QmTddMMUXw7tsnQzEY1BmR83xwqbCQRWwRXdP7vXZWi1kW%22%5D%29+%7B%0A++++subgraph%0A++++synced%0A++++health%0A++++entityCount%0A++++fatalError+%7B%0A++++++handler%0A++++++message%0A++++++deterministic%0A++++++block+%7B%0A++++++++hash%0A++++++++number%0A++++++%7D%0A++++%7D%0A++++chains+%7B%0A++++++chainHeadBlock+%7B%0A++++++++number%0A++++++%7D%0A++++++earliestBlock+%7B%0A++++++++number%0A++++++%7D%0A++++++latestBlock+%7B%0A++++++++number%0A++++++%7D%0A++++%7D%0A++%7D%0A%7D%0A

Any pointers would be appreciated

makeitrein avatar Dec 15 '22 18:12 makeitrein

@makeitrein Thanks for the reproducer - I figured out why this is happening, and am working on a fix.

I also noticed that at the failing block 36472865, the subgraph writes enormous amounts of data. The JSON dump of the changes at that block that I took for debugging is ~ 700MB, and it seems most of it is for LogEntry (~ 400MB) and Triple (~ 300 MB) I'll see how big that is in the database once I fix the bug, but it is a huge amount of data for one block.

lutter avatar Dec 16 '22 17:12 lutter

@makeitrein Thanks for the reproducer - I figured out why this is happening, and am working on a fix.

I also noticed that at the failing block 36472865, the subgraph writes enormous amounts of data. The JSON dump of the changes at that block that I took for debugging is ~ 700MB, and it seems most of it is for LogEntry (~ 400MB) and Triple (~ 300 MB) I'll see how big that is in the database once I fix the bug, but it is a huge amount of data for one block.

Thx for the quick reply! We're using a CSV to populate the initial data set, I'll see if we can explore chunking that out over a few hundred blocks.

makeitrein avatar Dec 16 '22 18:12 makeitrein

Thx for the quick reply! We're using a CSV to populate the initial data set, I'll see if we can explore chunking that out over a few hundred blocks.

I just wasn't sure if this was going to be an ongoing thing - doing this once to load initial data should be no problem (it'll take several minutes to process that block, but that should be fine)

lutter avatar Dec 19 '22 16:12 lutter

It would be good to be able to create this many entities in a single block w/ fulltext search. We're disabling fulltext search for now but would love to get it back in when possible.

yanivtal avatar Jan 05 '23 00:01 yanivtal