Indigo icon indicating copy to clipboard operation
Indigo copied to clipboard

Segfault while rebuilding index on molfile using bingo.molecule type

Open amhuhn2 opened this issue 1 year ago • 1 comments

Steps to Reproduce

  1. Use Indigo library (Bingo cartridge). Describe environment *Note: this issue is not 100% reproducible in our environment with a given set of data. It only seems to happen maybe 1/5 or 1/8 of the time.

OS: (output from uname -a): Linux bpeqabirdmlapvm02 4.18.0-240.15.1.el8_3.x86_64 #1 SMP Mon Mar 1 17:16:16 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

CentOS Linux release 8.3.2011

32 Gb RAM 8 CPU's

Output from "select version();":

PostgreSQL 12.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit

276 Gb available on the filesystem on which PostgreSQL keeps its data and logs.

  1. Add script or SQL to reproduce the issue See attached file.

script.txt

Actual behavior During or immediately after the rebuild of the bingo index, the OS records a segfault. The PostgreSQL log records a corrupted double-linked list, then PostgreSQL terminates and restarts.

Attached is an excerpt from /var/log/messages. messages.txt

Attached is an excerpt of the PostgreSQL log: (note that the stored procedure source.uspupsertmolecule executes the CREATE INDEX statement that seems to be failing. The next stored proc in the pipeline is source.uspupsertlot).

Note the error "corrupted double-linked list".

postgresql-log.txt

Expected behavior The bingo index should be rebuilt with no segfaults thrown, and the pipeline should continue on afterwards as normal. No "corrupted double-linked list" should be mentioned in the PostgreSQL log.

Environment details:

  • Back-end version Bingo 1.9.0

Attachments Three attachments included.

Additional context Add any other context about the problem here.

We are in the process of upgrading to the latest version of bingo, but reading through the release notes, I did not see any fixed bugs that looked like the issue we're experiencing. The closest ones had to do with CDX files, and we are parsing molfiles instead.

Fixed in 1.10:

#1068 CDX-loader crash

Fixed in 1.12:

#1126 Segfault when iterating CDX file from USPTO downloads

Fixed in 1.13:

#1139 core dumped when reading CDX file downloaded from USPTO

amhuhn2 avatar Mar 28 '24 18:03 amhuhn2

1. Description:

The issue involves a segmentation fault (segfault) occurring during or immediately after rebuilding the Bingo index in PostgreSQL, leading to a corrupted double-linked list error, database termination, and subsequent restart. This affects the stability of the database when processing MOL files via the source.uspupsertmolecule stored procedure. A hypothetical cause could be a memory corruption or buffer overflow in the Bingo cartridge (version 1.9.0) when handling large MOL files or complex indexing operations.


2. Steps to Reproduce (Steps to Reproduce):

  • Step 1: Set up the environment on CentOS Linux release 8.3.2011
  • Step 2: Create the regmol.parent_structure table and populate it with 211,000 records, including MOL files (longest 41,000 characters), using the script from script.txt.
  • Step 3: Execute the SQL statement CREATE INDEX idx_bingo_bingo_config ON source.parent_structure USING bingo_idx (mol_file COLLATE pg_catalog."default" bingo.molecule) within the source.uspupsertmolecule stored procedure.
  • Step 4: Monitor the system logs (/var/log/messages) and PostgreSQL logs (postgresql-log.txt) for segfaults or errors during index creation.
  • Note: The issue is not 100% reproducible, occurring approximately 1/5 to 1/8 of the time with the given dataset.

3. Expected Behavior (Expected Behavior):

The Bingo index should be rebuilt successfully without throwing segfaults, and the pipeline (including source.uspupsertmolecule and subsequent source.uspupsertlot) should continue normally. The PostgreSQL log should not contain errors such as "corrupted double-linked list."


4. Actual Behavior (Actual Behavior):

During or after the CREATE INDEX operation, a segfault is recorded in the kernel log (messages.txt), followed by a "corrupted double-linked list" error in the PostgreSQL log (postgresql-log.txt). This leads to PostgreSQL terminating (PID 197922) with signal 6 (Aborted), triggering a restart and recovery mode. The error reproduces intermittently (1/5 to 1/8 of attempts). Logs show:

  • Segfault at address c in bingo_postgres.so (stack trace in messages.txt).
  • Warnings about terminating connections due to shared memory corruption (postgresql-log.txt).
  • Recovery mode activation until the database is ready again.

5. Analysis of the Problem (Analysis of the Problem):

The problem likely originates from a bug in the Bingo cartridge (version 1.9.0) during index creation, possibly due to improper memory management or handling of large MOL files (up to 41,000 characters). Key observations:

  • The segfault in bingo_postgres.so suggests a memory access violation, potentially a buffer overflow or uninitialized pointer.
  • The "corrupted double-linked list" indicates heap corruption, which could result from invalid memory writes during MOL file processing or indexing.
  • The warning "stereogroup number 8 out of range" hints at a parsing error in stereochemical data, which may trigger the crash.
  • Involved modules: Bingo PostgreSQL extension (bingo_postgres.so), PostgreSQL backend. Confirmation from a business analyst is needed to validate MOL file content and stereochemistry.
  • The issue’s intermittency (1/5 to 1/8) may be tied to memory allocation patterns or specific MOL file characteristics.

6. Suggested Solutions (Suggested Solutions):

  • High-level solution: Implement input validation and limit checks for MOL files (e.g., size and stereochemistry) before indexing to prevent memory corruption.
  • Technical solution: Update to a newer Bingo version (e.g., 1.10 or later) and apply a patch to handle stereogroup out-of-range errors. Example workaround:
    CREATE OR REPLACE FUNCTION safe_upsertmolecule(varchar, varchar) RETURNS void AS $$
    BEGIN
        -- Validate MOL file size and structure before indexing
        IF LENGTH($1) > 40000 THEN
            RAISE NOTICE 'MOL file too large, skipping index creation';
            RETURN;
        END IF;
        EXECUTE 'CREATE INDEX idx_bingo_bingo_config ON source.parent_structure USING bingo_idx (mol_file COLLATE pg_catalog."default" bingo.molecule)';
    EXCEPTION
        WHEN OTHERS THEN
            RAISE NOTICE 'Index creation failed: %', SQLERRM;
    END;
    $$ LANGUAGE plpgsql;
    
    Replace source.uspupsertmolecule with this function.
  • Documentation improvement: Update Bingo documentation to highlight known issues with large MOL files or stereochemistry in version 1.9.0.

mobilisf avatar Jun 19 '25 09:06 mobilisf