graph-node
graph-node copied to clipboard
Spurious error on DB setup
DB setup has been observed to fail in the hosted service with the following error logs:
2022-03-16T19:29:16.264973019Z stderr F thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(...)', /graph-node/store/postgres/src/connection_pool.rs:513:10
2022-03-16 19:29:16
2022-03-16T19:29:16.264648288Z stderr F Mar 16 19:29:16.264 CRIT migrations failed, error: store error: failed to find plan for subquery ss, pool: main, shard: blocks_a, component: ConnectionPool
2022-03-16 19:29:16
2022-03-16T19:29:16.264640948Z stderr F note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2022-03-16 19:29:16
2022-03-16T19:29:16.264553508Z stderr F thread 'tokio-runtime-worker' panicked at 'migrations failed: store error: failed to find plan for subquery ss', store/postgres/src/connection_pool.rs:987:13
This causes graph node to crash, until it the setup eventually succeeds for all shards. It's been observed on both PG 13 and PG 14 shards.
The error comes from this line in the Postgres sources.
From PG logs, the query where that happens is
select c.column_name::text, c.data_type::text,
c.udt_name::text, c.udt_schema::text, e.data_type::text as elem_type
from information_schema.columns c
left join information_schema.element_types e
on ((c.table_catalog, c.table_schema, c.table_name, 'TABLE', c.dtd_identifier)
= (e.object_catalog, e.object_schema, e.object_name, e.object_type, e.collection_type_identifier))
where c.table_schema = $1
and c.table_name = $2
order by c.ordinal_position
I've manually run this query with all possible values for the bind variables that are used in the code, without getting an error. e.g., $1 = 'public', $2 = 'chains and $1 = 'subgraphs', $2 = 'subgraph' without getting this error.
Also worth noting: the error and the query are the only things that backend logs; I can't find any other log messages from that backend (assuming that the [nnn] in the logs is the pid of the backend)
checking here @lutter are you saying that this isn't a Graph Node issue, or is more investigation required?
It's not a graph-node issue; I've raised that with some external parties, but everybody seems to be stumped as to what causes that.
In my case, this error occurs when I try to use the information_schema.element_types table in a LEFT JOIN, I used this table to get the type of a field when it was for example integer[], because in these types the data_type column of the information_schema.columns table was just ARRAY.
I could not find out the reason for the error when querying this table, for this reason I replaced the query table with pg_catalog.pg_attribute, as shown below, and with that I had no more problems:
SELECT c.table_schema, c.table_name, c.column_name, pg_catalog.format_type(a.atttypid, a.atttypmod) as data_type
FROM information_schema.columns c
INNER JOIN information_schema.tables t ON (c.table_schema = t.table_schema and c.table_name = t.table_name)
INNER JOIN pg_catalog.pg_attribute a ON ((c.table_schema||'.'||c.table_name)::regclass = a.attrelid and a.attname = c.column_name)
WHERE c.table_name = 'table_name'
ORDER BY 1, 2