pglogical icon indicating copy to clipboard operation
pglogical copied to clipboard

Segmentation fault issues in pglogical version 2.4.6

Open aniljoshi2022 opened this issue 1 month ago • 1 comments

I continue getting the same crash problem - https://github.com/2ndQuadrant/pglogical/issues/446 & https://github.com/2ndQuadrant/pglogical/issues/367 while syncing non-partition tables to partition based tables in latest pglogical version 2.4.6 as well.

2025-11-02 07:25:56.842 UTC [2651842] LOG: background worker "pglogical apply 16593:2528659118" (PID 1158871) was terminated by signal 11: Segmentation fault

Already tried the workaround by adding the non-partition table in the provider again with synchronize_data := false however this doesn't fix the issue.

SELECT pglogical.replication_set_add_table(
    set_name := 'repset',
    relation := 'public.part',
    synchronize_data := false
);
SELECT pglogical.replication_set_add_all_tables(
    set_name := 'repset',
    schema_names := ARRAY['public'],
    synchronize_data := false
);

I understand the schema mismatch can cause problems but there should be some better way to handle such in consistency related error as this cause continuous crash on subscriber node due to which impossible to run any query/command there. The only solution we found to stop the node from crashing is stopping the subscriber node and then dropping the slot from the publisher.

source=# SELECT slot_name, active, restart_lsn
FROM pg_replication_slots
WHERE slot_name LIKE '%s_rep%';
       slot_name        | active | restart_lsn
------------------------+--------+-------------
 pgl_target__ode1_s_rep | t      | 0/1B7D230
(1 row)

SELECT pg_drop_replication_slot('pgl_target__ode1_s_rep');

I tested this scenario on multiple versions of both pglogical and postgres however the outcome was same crash.

pglogical versions (2.4.4, 2.4.5, 2.4.6)
PostgreSQL versions (16.X, 17.x)

aniljoshi2022 avatar Nov 02 '25 09:11 aniljoshi2022

Root Cause Analysis: Generated Columns Crash

After extensive debugging, I've identified a root cause for crashes involving tables with filtered column lists (using replication_set_add_table with a columns parameter).

The Problem

When a table has columns excluded from replication (e.g., a generated column), fill_missing_defaults() in pglogical_apply_heap.c attempts to evaluate "defaults" for those missing columns. For generated columns (PostgreSQL 12+), this causes a segfault because:

  1. Generated columns don't have traditional defaults - they have generation expressions
  2. The generation expression depends on other column values that may not be properly accessible in this context
  3. build_column_default() returns the generation expression, and ExecEvalExpr() crashes when evaluating it

Reproduction

  • Table with a GENERATED ALWAYS AS ... STORED column
  • Exclude the generated column via replication_set_add_table(..., columns := ARRAY[...])
  • Perform an UPDATE on the provider
  • Subscriber crashes with SIGSEGV in the apply worker

Proposed Fix

Skip generated columns in fill_missing_defaults() since PostgreSQL automatically computes them during INSERT/UPDATE:

// pglogical_apply_heap.c, in fill_missing_defaults(), inside the for loop:

Form_pg_attribute att = TupleDescAttr(desc, attnum);

if (att->attisdropped)
    continue;

if (physatt_in_attmap(rel, attnum))
    continue;

/* Skip generated columns - computed automatically by PostgreSQL */
#if PG_VERSION_NUM >= 120000
if (att->attgenerated)
    continue;
#endif

defexpr = (Expr *) build_column_default(rel->rel, attnum + 1);

This fix is minimal, backwards-compatible (guarded for PG12+), and resolves the crash in my testing. Happy to submit a PR if this approach looks reasonable.

Environment

  • PostgreSQL 18.1
  • pglogical 2.4.6
  • macOS

willibrandon avatar Dec 10 '25 19:12 willibrandon