Indigo
Indigo copied to clipboard
Performing an exact match search results in PostgreSQL termination
Summary Performing an exact match search results in PostgreSQL termination
Steps to Reproduce
-
RHEL 8.10, PostgreSQL 15, and Bingo bingo-postgres15-linux-x86_64-1.29.0.0. I have confirmed the same issue exists with every point release between 1.25.0.0 and 1.29.0.0.
-
Add script or SQL to reproduce the issue
/* This fails */
SELECT g_0."parent_structure_id"
FROM "regmol"."parent_structure" AS g_0
WHERE (g_0."mol_file" @ ($1, '0.1')::bingo.exact) = TRUE;
/* This succeeds with no errors */
SELECT g_0."parent_structure_id"
FROM "regmol"."parent_structure" AS g_0
WHERE (g_0."mol_file" @ ($1, '')::bingo.sub) = TRUE;
Actual behavior PostgreSQL terminates with the following errors in the log:
(sometimes the following) corrupted size vs. prev_size while consolidating (OR sometimes the following) double free or corruption (top) (always the following) ERROR: error: bingo search engine: unknown index type 0
Expected behavior A result set to be returned, without terminating PostgreSQL.
Environment details: RHEL 8.10, PostgreSQL 15, and Bingo bingo-postgres15-linux-x86_64-1.29.0.0. I have confirmed the same issue exists with every point release between 1.25.0.0 and 1.29.0.0.
We did not have this issue on our previous environment: CentOS 8.3, PostgreSQL 12, and Bingo 1.9.1.r1
Attachments If applicable, add attachment files to reproduce the issue.
Additional context The error is not 100% consistent. Also, the following two errors (as reported by bingo.checkmolecule()) exist in our TEST environment (where we are receiving the catastrophic error) but not in our dev environment (where we are NOT experiencing the error):
stereocenters: have hydrogen(s) besides implicit hydrogen near stereocenter <number>
molfile loader: empty pseudo-atom
I will be testing Bingo 1.33 when it is released next seek, to see if it fixes the error.
I am also getting this behavior. PostgreSQL 17 with the bingo binary built from source off of master.
All bingo functions work fine on a toy database with very few records, but with an index with millions of records exact crashes the DB server and sub/sim never complete their queries.
I'm getting a related behavior (in my case "ERROR: improper call to spi_printtup" is shown) when I defined a trigger with a ::bingo.exact query while the same trigger with a ::bingo.sub query behave properly
PostgreSQL Termination During Exact Match Search with Bingo
1. Problem Description:
Executing an exact match search (::bingo.exact) on a PostgreSQL database with the Bingo cartridge leads to PostgreSQL termination. This issue occurs in environments running RHEL 8.10/PostgreSQL 15 with Bingo versions 1.25.0.0 to 1.29.0.0. It is also confirmed with PostgreSQL 17. Errors in the PostgreSQL log include, but are not limited to:
ERROR: bingo search engine: unknown index type 0corrupted size vs. prev_size while consolidatingdouble free or corruption (top)
In contrast, substructure searches (::bingo.sub) succeed without errors. The problem appears more frequently on large datasets with millions of records and databases containing invalid molecular structures.
2. Steps to Reproduce:
-
Environment Setup:
- OS: RHEL 8.10
- PostgreSQL: Version 15 (or 17 for confirmation)
- Bingo Cartridge:
bingo-postgres15-linux-x86_64-1.29.0.0or any version from 1.25.0.0 onward.
-
Database Configuration:
- Populate the
regmol.parent_structuretable with millions of chemical structures, ensuring themol_filecolumn is indexed using Bingo'sbingo.exactindex type.
- Populate the
-
SQL Queries:
- Failing Query:
SELECT g_0."parent_structure_id" FROM "regmol"."parent_structure" AS g_0 WHERE (g_0."mol_file" @ ($1, '0.1')::bingo.exact) = TRUE; - Succeeding Query:
SELECT g_0."parent_structure_id" FROM "regmol"."parent_structure" AS g_0 WHERE (g_0."mol_file" @ ($1, '')::bingo.sub) = TRUE;
- Failing Query:
-
Expected Behavior:
- The
::bingo.exactquery should return a valid result (empty or non-empty dataset) without crashing PostgreSQL.
- The
-
Actual Behavior:
- PostgreSQL server terminates during the execution of the
::bingo.exactquery. - Logs consistently show:
corrupted size vs. prev_size while consolidatingdouble free or corruption (top)ERROR: bingo search engine: unknown index type 0
- PostgreSQL server terminates during the execution of the
3. Hypothesized Root Causes:
-
Memory Management Bug:
- The observed memory corruption errors (
corrupted size vs. prev_size,double free or corruption) suggest improper allocation, deallocation, or access to memory in Bingo'sexactmatch routine.
- The observed memory corruption errors (
-
Index Handling Issue:
- The
unknown index type 0error points to a failure in reading or interpreting the index during exact matching. - Possible causes: version mismatch between index creation and cartridge expectations, or index corruption in large datasets.
- The
-
PostgreSQL Version Incompatibility:
- Breaking changes in PostgreSQL APIs between version 12 and 15/17 could lead to inconsistencies in how extensions, like Bingo, interact with PostgreSQL, especially concerning memory management.
-
Data-Specific Sensitivities:
- Errors like "stereocenters: have hydrogen(s)" and "molfile loader: empty pseudo-atom" from
bingo.checkmolecule()indicate problematic structures in the test environment. These could trigger unhandled conditions in the Bingo exact match logic.
- Errors like "stereocenters: have hydrogen(s)" and "molfile loader: empty pseudo-atom" from
4. Proposed Solutions:
High-Level Solutions:
- Conduct a thorough debugging of Bingo's
exactsearch algorithm, focusing on memory allocation, index management, and PostgreSQL compatibility. - Enhance error handling and validation within Bingo to detect and respond to invalid indexes or problematic data without crashing PostgreSQL.
Technical Solutions:
-
Memory Debugging:
- Use tools like Address Sanitizer (ASan) or Valgrind to identify specific memory management issues (e.g., use-after-free, buffer overflows) in the Bingo codebase.
- Conduct a detailed code review of the
::bingo.exactfunctionality.
-
Improve Index Handling:
- Investigate the root cause of the
unknown index type 0error and implement robust validations to identify and reject invalid index states before queries execute.
- Investigate the root cause of the
-
PostgreSQL Compatibility:
- Ensure the Bingo cartridge is compiled specifically for the target PostgreSQL version, addressing potential API changes.
- Align Bingo's C/C++ codebase with PostgreSQL 15/17 memory context handling requirements.
-
Handle Data Sensitivity:
- Pre-validate chemical structures with
bingo.checkmolecule()before indexing to avoid introducing malformed molecules.
- Pre-validate chemical structures with
Hi @mobilisf . it will be great if you provide "table with millions of chemical structures" at least provide the datasource (is it pubchem structures or chembl, etc)
@AlexanderSavelyev My example would be Pubchem structures as the identity source.
Can bingo PostgreSQL scale to pubchem size? Are there any techniques to apply to make bingo search faster or more reliable on large datasets? Currently, doing an exact search crashes PostgreSQL, and doing any other type of search, the query never completes.
I have tried Bingo 1.33.0-rc2. It still produces segfaults in PostgreSQL, no change from version 1.29. Note that I also tried 1.30, which caused the same errors.
I have isolated to specific records which cause the issue. I will see if I can produce non-sensitive test data to reproduce the problem.
In my case I had built bingo for postgresql 17 from the master branch, since this PR added postgres 17 support to bingo https://github.com/epam/Indigo/pull/2796
Cannot reproduce the issue.
Bingo version 1.26.0. PostgreSQL version 14
I prepared DB with 42+M molecules from https://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF but issue not reproduced.
I also try request from #1456 - it works ok too.
I am attaching a script to reliably reproduce this issue on RHEL 8. Please note that I am noticing that the issue only occurs when performing a join between two tables, not when selecting against a single table.
/*
Create schema to contain the objects used in this test:
a) Table compound
b) Table compound_security
Both tables contain a column, compound_id, that is used to join the two tables.
After creating both tables, create a bingo index on table compound.
Populate both tables with one row each.
SELECT from the compound table using the bingo index. This succeeds.
SELECT from a join of both tables, usin the bingo index. This fails.
*/
/* Create a new schema */
DROP SCHEMA IF EXISTS issue2922 CASCADE;
CREATE SCHEMA IF NOT EXISTS issue2922
AUTHORIZATION andrew_huhn;
/* Create table compound */
DROP TABLE IF EXISTS issue2922.compound;
CREATE TABLE issue2922.compound
(
compound_id INTEGER NOT NULL,
mol_file text,
PRIMARY KEY (compound_id)
);
ALTER TABLE IF EXISTS issue2922.compound
OWNER to postgres;
/* Create a bingo index on table compound */
CREATE INDEX IF NOT EXISTS idx_compound
ON issue2922.compound USING bingo_idx
(mol_file bingo.molecule);
/* Create table compound_security */
DROP TABLE IF EXISTS issue2922.compound_security;
CREATE TABLE issue2922.compound_security
(
compound_id INTEGER NOT NULL,
PRIMARY KEY (compound_id)
);
ALTER TABLE IF EXISTS issue2922.compound_security
OWNER to postgres;
/* Populate both tables with one row each */
insert
into issue2922.compound_security
( compound_id )
values
( 736832 );
insert
into issue2922.compound
( compound_id, mol_file )
values ( 736832, 'Unnamed
MolEngine06122512002D
0 0 0 999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 14 13 0 0 0
M V30 BEGIN ATOM
M V30 1 C 14.907 -3.539 0 0
M V30 2 C 16.258 -2.759 0 0
M V30 3 H 16.258 -1.199 0 0
M V30 4 C 17.609 -3.539 0 0
M V30 5 C 18.96 -2.759 0 0
M V30 6 H 18.96 -1.199 0 0
M V30 7 H 20.311 -3.539 0 0
M V30 8 H 17.609 -5.099 0 0
M V30 9 H 14.907 -5.099 0 0
M V30 10 H 13.556 -2.759 0 0
M V30 11 H 14.907 -1.979 0 0
M V30 12 H 16.258 -4.319 0 0
M V30 13 H 17.609 -1.979 0 0
M V30 14 H 20.311 -1.979 0 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 2
M V30 2 1 2 3
M V30 3 1 2 4
M V30 4 1 4 5
M V30 5 1 5 6
M V30 6 1 5 7
M V30 7 1 4 8
M V30 8 1 1 9
M V30 9 1 1 10
M V30 10 1 1 11
M V30 11 1 2 12
M V30 12 1 4 13
M V30 13 1 5 14
M V30 END BOND
M V30 END CTAB
M END' );
/* Now we get into the SELECTs to demonstrate the problem.
The following succeeds. The bingo index is used to select
from the compound table. A single row, with c=1, is returned.
*/
SELECT count(1) as c
FROM issue2922.compound AS c
WHERE (c.mol_file @ ('Unnamed
MolEngine06122512002D
0 0 0 999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 14 13 0 0 0
M V30 BEGIN ATOM
M V30 1 C 14.907 -3.539 0 0
M V30 2 C 16.258 -2.759 0 0
M V30 3 H 16.258 -1.199 0 0
M V30 4 C 17.609 -3.539 0 0
M V30 5 C 18.96 -2.759 0 0
M V30 6 H 18.96 -1.199 0 0
M V30 7 H 20.311 -3.539 0 0
M V30 8 H 17.609 -5.099 0 0
M V30 9 H 14.907 -5.099 0 0
M V30 10 H 13.556 -2.759 0 0
M V30 11 H 14.907 -1.979 0 0
M V30 12 H 16.258 -4.319 0 0
M V30 13 H 17.609 -1.979 0 0
M V30 14 H 20.311 -1.979 0 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 2
M V30 2 1 2 3
M V30 3 1 2 4
M V30 4 1 4 5
M V30 5 1 5 6
M V30 6 1 5 7
M V30 7 1 4 8
M V30 8 1 1 9
M V30 9 1 1 10
M V30 10 1 1 11
M V30 11 1 2 12
M V30 12 1 4 13
M V30 13 1 5 14
M V30 END BOND
M V30 END CTAB
M END', '0.1')::bingo.exact) = TRUE;
/* The following fails with this error:
ERROR: could not find block containing chunk 0x64fdbb8
SQL state: XX000
It is attempting to select from a JOIN of the two tables,
using the bingo index in the WHERE clause.
*/
SELECT count(1) as c
FROM issue2922.compound AS c
INNER JOIN
issue2922.compound_security AS cs
ON c.compound_id = cs.compound_id
WHERE (c.mol_file @ ('Unnamed
MolEngine06122512002D
0 0 0 999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 14 13 0 0 0
M V30 BEGIN ATOM
M V30 1 C 14.907 -3.539 0 0
M V30 2 C 16.258 -2.759 0 0
M V30 3 H 16.258 -1.199 0 0
M V30 4 C 17.609 -3.539 0 0
M V30 5 C 18.96 -2.759 0 0
M V30 6 H 18.96 -1.199 0 0
M V30 7 H 20.311 -3.539 0 0
M V30 8 H 17.609 -5.099 0 0
M V30 9 H 14.907 -5.099 0 0
M V30 10 H 13.556 -2.759 0 0
M V30 11 H 14.907 -1.979 0 0
M V30 12 H 16.258 -4.319 0 0
M V30 13 H 17.609 -1.979 0 0
M V30 14 H 20.311 -1.979 0 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 2
M V30 2 1 2 3
M V30 3 1 2 4
M V30 4 1 4 5
M V30 5 1 5 6
M V30 6 1 5 7
M V30 7 1 4 8
M V30 8 1 1 9
M V30 9 1 1 10
M V30 10 1 1 11
M V30 11 1 2 12
M V30 12 1 4 13
M V30 13 1 5 14
M V30 END BOND
M V30 END CTAB
M END', '0.1')::bingo.exact) = TRUE;
Verified at Ubuntu Ubuntu 22.04.5 LTS
% lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
PostgeSQL 12:
% apt list --installed | grep postgre
postgresql-12/jammy-pgdg,now 12.22-2.pgdg22.04+1 amd64 [installed]
postgresql-client-12/jammy-pgdg,now 12.22-2.pgdg22.04+1 amd64 [installed]
postgresql-client-common/jammy-pgdg,now 281.pgdg22.04+1 all [installed,automatic]
postgresql-common-dev/jammy-pgdg,now 281.pgdg22.04+1 all [installed,automatic]
postgresql-common/jammy-pgdg,now 281.pgdg22.04+1 all [installed,automatic]
postgresql-server-dev-12/jammy-pgdg,now 12.22-2.pgdg22.04+1 amd64 [installed]
Request with JOIN return same error, connection lost:
...............
postgres'# M END', '0.1')::bingo.exact) = TRUE;
ERROR: could not find block containing chunk 0x559cccea6648
postgres=# SELECT count(1) as c
FROM issue2922.compound AS c
INNER JOIN
issue2922.compound_security AS cs
ON c.compound_id = cs.compound_id
WHERE (c.mol_file @ ('Unnamed
..........
M END', '0.1')::bingo.exact) = TRUE;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
After reconnect request return different error:
ERROR: error: bingo buffer: internal error: corrupted block 1 data len is -8
Verified at CentOS 8.3.2011 PostgreSQL 10.15 Bingo 1.9.1.r1:
[postgres@5f7d7b8cfae9 ~]$ pg_ctl -V
pg_ctl (PostgreSQL) 10.15
[postgres@5f7d7b8cfae9 ~]$ uname -a
CentOS Linux release 8.3.2011
[postgres@5f7d7b8cfae9 ~]$ ls
backups bingo-postgres10-1.9.1.r1-linux64 bingo-postgres10-1.9.1.r1-linux64.zip data logfile
Got same error:
postgres'# M END', '0.1')::bingo.exact) = TRUE;
ERROR: could not find block containing chunk 0x55c6121a5db8
postgres=#
@amhuhn2 Are you sure you haven't had this problem with Bingo 1.9.1.r1?
We did have a similar error 2 or 3 years ago, but it went away. I'm not aware of anything we did on our own to fix it, I can't explain why it started or why it stopped.
We have recently moved to a new server, which caused our upgrade to Postgres 15 and Bingo 1.29.0.0 on RHEL 8. I can say that we have had this issue since day 1 in our new environment, but I can't answer why we stopped having the problem in the old environment.
Verified at CentOS 8.3.2011 PostgreSQL 10.15 Bingo 1.9.1.r1:
[postgres@5f7d7b8cfae9 ~]$ pg_ctl -V pg_ctl (PostgreSQL) 10.15 [postgres@5f7d7b8cfae9 ~]$ uname -a CentOS Linux release 8.3.2011 [postgres@5f7d7b8cfae9 ~]$ ls backups bingo-postgres10-1.9.1.r1-linux64 bingo-postgres10-1.9.1.r1-linux64.zip data logfileGot same error:
postgres'# M END', '0.1')::bingo.exact) = TRUE; ERROR: could not find block containing chunk 0x55c6121a5db8 postgres=#@amhuhn2 Are you sure you haven't had this problem with Bingo 1.9.1.r1?
Hello,
Please see my comments above:
We did have a similar error 2 or 3 years ago, but it went away. I'm not aware of anything we did on our own to fix it, I can't explain why it started or why it stopped.
We have recently moved to a new server, which caused our upgrade to Postgres 15 and Bingo 1.29.0.0 on RHEL 8. I can say that we have had this issue since day 1 in our new environment, but I can't answer why we stopped having the problem in the old environment.
I'd like to give a status update to my users. What's the latest?
Thank you! Andy
i've suffered from this problem for many years...
i think it will occur more easily when multiple transactions rapidly accessing the table with bingo index and joining other table
Cannot reproduce the issue.无法重现该问题。
Bingo version 1.26.0. Bingo 版本 1.26.0。 PostgreSQL version 14 PostgreSQL 版本 14
I prepared DB with 42+M molecules from
https://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDFbut issue not reproduced.我用来自https://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF的 42+M 分子准备了 DB,但问题没有重现。 I also try request from #1456 - it works ok too.我也尝试了来自 #1456 的请求 - 它也运行正常。
@AlexanderSavelyev My example would be Pubchem structures as the identity source.
Can bingo PostgreSQL scale to pubchem size? Are there any techniques to apply to make bingo search faster or more reliable on large datasets? Currently, doing an exact search crashes PostgreSQL, and doing any other type of search, the query never completes.
i used a trick to avoid this error,
by adding a hash index of bingo.cansmiles(smiles)
when performing a exact search, we use where bingo.cansmiels(smiles) = bingo.cansmiles(query)
but apparently, it will lost some funtionality of bingo.exact, like exact match ignoring chiral
at the end, we figure out a fix for the bug to postpone the SPI function to the end of a transaction. The fix will be in 1.37, but if required, we can move it to 1.36