graph-node icon indicating copy to clipboard operation
graph-node copied to clipboard

Spurious error on DB setup

Open leoyvens opened this issue 3 years ago • 4 comments

DB setup has been observed to fail in the hosted service with the following error logs:

2022-03-16T19:29:16.264973019Z stderr F thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(...)', /graph-node/store/postgres/src/connection_pool.rs:513:10
2022-03-16 19:29:16	
2022-03-16T19:29:16.264648288Z stderr F Mar 16 19:29:16.264 CRIT migrations failed, error: store error: failed to find plan for subquery ss, pool: main, shard: blocks_a, component: ConnectionPool
2022-03-16 19:29:16	
2022-03-16T19:29:16.264640948Z stderr F note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2022-03-16 19:29:16	
2022-03-16T19:29:16.264553508Z stderr F thread 'tokio-runtime-worker' panicked at 'migrations failed: store error: failed to find plan for subquery ss', store/postgres/src/connection_pool.rs:987:13

This causes graph node to crash, until it the setup eventually succeeds for all shards. It's been observed on both PG 13 and PG 14 shards.

leoyvens avatar Mar 17 '22 19:03 leoyvens

The error comes from this line in the Postgres sources.

From PG logs, the query where that happens is

select c.column_name::text, c.data_type::text,
           c.udt_name::text, c.udt_schema::text, e.data_type::text as elem_type
      from information_schema.columns c
      left join information_schema.element_types e
           on ((c.table_catalog, c.table_schema, c.table_name, 'TABLE', c.dtd_identifier)
            = (e.object_catalog, e.object_schema, e.object_name, e.object_type, e.collection_type_identifier))
     where c.table_schema = $1
       and c.table_name = $2
     order by c.ordinal_position

I've manually run this query with all possible values for the bind variables that are used in the code, without getting an error. e.g., $1 = 'public', $2 = 'chains and $1 = 'subgraphs', $2 = 'subgraph' without getting this error.

Also worth noting: the error and the query are the only things that backend logs; I can't find any other log messages from that backend (assuming that the [nnn] in the logs is the pid of the backend)

lutter avatar Mar 17 '22 20:03 lutter

checking here @lutter are you saying that this isn't a Graph Node issue, or is more investigation required?

azf20 avatar Apr 29 '22 11:04 azf20

It's not a graph-node issue; I've raised that with some external parties, but everybody seems to be stumped as to what causes that.

lutter avatar Apr 29 '22 17:04 lutter

In my case, this error occurs when I try to use the information_schema.element_types table in a LEFT JOIN, I used this table to get the type of a field when it was for example integer[], because in these types the data_type column of the information_schema.columns table was just ARRAY.

I could not find out the reason for the error when querying this table, for this reason I replaced the query table with pg_catalog.pg_attribute, as shown below, and with that I had no more problems:

SELECT c.table_schema, c.table_name, c.column_name, pg_catalog.format_type(a.atttypid, a.atttypmod) as data_type
FROM information_schema.columns c
INNER JOIN information_schema.tables t ON (c.table_schema = t.table_schema and c.table_name = t.table_name)
INNER JOIN pg_catalog.pg_attribute a ON ((c.table_schema||'.'||c.table_name)::regclass = a.attrelid and a.attname = c.column_name)
WHERE c.table_name = 'table_name'
ORDER BY 1, 2

ricardocrescenti avatar Jul 14 '22 17:07 ricardocrescenti