citus icon indicating copy to clipboard operation
citus copied to clipboard

Citus upgrade faillure

Open parameswara001 opened this issue 6 months ago • 3 comments

I am upgrading citus cluster upgrade from PG15 to PG17. after pg_upgrade got completed but when i am doing "select citus_finish_pg_upgrade() " function got the following error.

citus=# SELECT citus_finish_pg_upgrade(); ERROR: function "array_cat_agg" already exists with same argument types CONTEXT: SQL statement " -- disable propagation to prevent EnsureCoordinator errors -- the aggregate created here does not depend on Citus extension (yet) -- since we add the dependency with the next command SET citus.enable_ddl_propagation TO OFF; CREATE AGGREGATE array_cat_agg(anycompatiblearray) (SFUNC = array_cat, STYPE = anycompatiblearray); COMMENT ON AGGREGATE array_cat_agg(anycompatiblearray) IS 'concatenate input arrays into a single array'; RESET citus.enable_ddl_propagation; " PL/pgSQL function citus_finish_pg_upgrade() line 8 at EXECUTE

and one more is the following error

citus=# SELECT citus_finish_pg_upgrade(); ERROR: more than one row returned by a subquery used as an expression CONTEXT: SQL statement "INSERT INTO pg_depend SELECT 'pg_proc'::regclass::oid as classid, (SELECT oid FROM pg_proc WHERE proname = 'array_cat_agg') as objid, 0 as objsubid, 'pg_extension'::regclass::oid as refclassid, (select oid from pg_extension where extname = 'citus') as refobjid, 0 as refobjsubid , 'e' as deptype" PL/pgSQL function citus_finish_pg_upgrade() line 36 at SQL statement

Can you help me to resolve this issue.

parameswara001 avatar Jun 09 '25 06:06 parameswara001

Which Citus version are you on?

Can you try to upgrade in single steps? from PG15->PG16 and then PG16->PG17?

emelsimsek avatar Jun 12 '25 09:06 emelsimsek

@emelsimsek I think I found the problem. Citus is missing upgrade path from 12.1.5 to 13.1.0

This is what we found yesterday night:

Starting point: PostgreSQL 15, Citus 12.1.5

We upgraded from 12.1.5 to 13.0.4 and then from 13.0.4 to 13.1.0 without any issue. Finally upgrading PostgreSQL 15 to 17 worked.

if we skip 13.0.4 update, the procedure fails with the message described above.

I think Citus should add extension upgrade path for all supported combinations.

devrimgunduz avatar Jun 18 '25 08:06 devrimgunduz

So both the for OP and me we'll need a proper upgrade script to patch the systems

devrimgunduz avatar Jun 18 '25 08:06 devrimgunduz

I do not think that we have a problem regarding a missing upgrade path here. Citus extension versions are of the form vX.Y.Z and extension schema versions are in the form X.Y-Z and there is not a one to one mapping as we do not always bump the schema version on each extension release.

v12.1.5 uses migration version of 12.1-1

https://github.com/citusdata/citus/blob/15ecc37ecd766af4393c9bb222aa1efe038dc5f3/src/backend/distributed/citus.control#L3

v13.1.0 uses migration version of 13.1-1

https://github.com/citusdata/citus/blob/ad266a2c0a3c9bdab1426ed33d1bc2d9a7d74f45/src/backend/distributed/citus.control#L3

v13.1.0 contains a migration path from 12.1-1 to 13.0-1 to 13.1-1

https://github.com/citusdata/citus/blob/v13.1.0/src/backend/distributed/sql/citus--12.1-1--13.0-1.sql

https://github.com/citusdata/citus/blob/v13.1.0/src/backend/distributed/sql/citus--13.0-1--13.1-1.sql

hanefi avatar Jun 18 '25 16:06 hanefi

So both the for OP and me we'll need a proper upgrade script to patch the systems

I agree with this. I guess we should fix what is broken with the current citus_finish_pg_upgrade() implementation and add it to a new migration script for several recent Citus major versions.

hanefi avatar Jun 18 '25 16:06 hanefi

@parameswara001, did you run pg_catalog.citus_prepare_pg_upgrade() at the beginning as described here? https://docs.citusdata.com/en/v13.0/admin_guide/upgrading_citus.html#upgrading-postgresql-version-from-16-to-17

emelsimsek avatar Jun 18 '25 19:06 emelsimsek

@emelsimsek in our case: Yes.

devrimgunduz avatar Jun 19 '25 14:06 devrimgunduz

@devrimgunduz @emelsimsek @hanefi This is Mir from ScaleGrid, a leading Database-as-a-Service (DBaaS) provider.

We’re excited to announce the upcoming launch of our support for Citus on PostgreSQL — compatible up to version 17. Our platform manages the entire database lifecycle, including provisioning, setup, version upgrades, automated backups, and more.

The service will be available across multiple cloud platforms, including AWS, GCP, DigitalOcean, Akamai, OCI, and Zadara.

We’d love for you to try it out and share your feedback on the experience. Your input would be incredibly valuable as we continue refining the offering.

Looking forward to hearing your thoughts!

mirazam1640 avatar Jun 24 '25 21:06 mirazam1640

Cannot reproduce the issue unless I omit citus_prepare_pg_upgrade() - can it be the case that you mistakenly executed citus_prepare_pg_upgrade() on the wrong database?

Here is the quick script I used to try reproducing the issue. It assumes that you have pgenv and citus_dev available in your PATH, and the Citus repo. First five params need to be adjusted in case someone is interested in executing the script.

#!/bin/bash

# params to adjust for your env and what you want to test, old / new Citus versions are fixed
PGENV_PATH="/home/onurctirtir/.pgenv"
TEST_DBS_PATH="/home/onurctirtir/test-dbs"
CITUS_PATH="/home/onurctirtir/citus"
OLD_PG_VER="15.13"
NEW_PG_VER="17.4"

# Actual database that's part of the Citus cluster that we want to upgrade.
# Rest of the code will assume that there is a separate "postgres" database
# that we can connect to drop the database UPGRADE_DB_NAME.
# citus_dev always creates a database with the same name as the user and uses
# it to build the Citus cluster.
UPGRADE_DB_NAME="$whoami"

pkill postgres

pgenv build "$OLD_PG_VER"
pgenv build "$NEW_PG_VER"

mkdir -P "$TEST_DBS_PATH"

set -euo pipefail

# 1 - create old cluster with Citus 12.1.5 and with OLD_PG_VER
pgenv switch "$OLD_PG_VER"

cd $CITUS_PATH
git checkout v12.1.5
make clean && make install-all -sj16

citus_dev make "$TEST_DBS_PATH/test-upgrade-$OLD_PG_VER"/ --destroy

# can simulate some user activity here #

# 2 - upgrade to Citus 13.1.0
cd $CITUS_PATH
git checkout v13.1.0
make clean && make install-all -sj16

citus_dev restart "$TEST_DBS_PATH/test-upgrade-$OLD_PG_VER"/

for port in 9700 9701 9702; do
    psql -p $port -d "$UPGRADE_DB_NAME" -c "ALTER EXTENSION citus UPDATE;"
done

# 3 - prepage for pg_upgrade
for port in 9700 9701 9702; do
    psql -p $port -d "$UPGRADE_DB_NAME" -c "SELECT citus_prepare_pg_upgrade();"
done

citus_dev stop "$TEST_DBS_PATH/test-upgrade-$OLD_PG_VER"/

# 4 - create new cluster and drop the database UPGRADE_DB_NAME
pgenv switch "$NEW_PG_VER"

cd $CITUS_PATH
make clean && make install-all -sj16

citus_dev make "$TEST_DBS_PATH/test-upgrade-$NEW_PG_VER"/ --destroy

# Drop the database UPGRADE_DB_NAME. (citus_dev always creates a database with the
# same name as the user and uses it for the Citus cluster)
for port in 9700 9701 9702; do
    psql -p $port -d postgres -c "drop database if exists $UPGRADE_DB_NAME (force)";
done

citus_dev stop "$TEST_DBS_PATH/test-upgrade-$NEW_PG_VER"/

# 5 - run pg_upgrade for NEW_PG_VER and run citus_finish_pg_upgrade()
function upgrade_one {
    subdir="$1"
    port="$2"

    pg_upgrade \
    -d "$TEST_DBS_PATH/test-upgrade-$OLD_PG_VER"/"$subdir"/ \
    -D "$TEST_DBS_PATH/test-upgrade-$NEW_PG_VER"/"$subdir"/ \
    -b "$PGENV_PATH"/pgsql-"$OLD_PG_VER"/bin/ \
    -B "$PGENV_PATH"/pgsql-"$NEW_PG_VER"/bin/ \
    -p $port -P $port
}

upgrade_one coordinator 9700
upgrade_one worker0 9701
upgrade_one worker1 9702

citus_dev start "$TEST_DBS_PATH/test-upgrade-$NEW_PG_VER"/

for port in 9700 9701 9702; do
    psql -p $port -d "$UPGRADE_DB_NAME" -c "SELECT citus_finish_pg_upgrade();"
done

onurctirtir avatar Jun 25 '25 13:06 onurctirtir

I also reviewed the issue and agree with the previous comments — it is reproducible when citus_prepare_pg_upgrade() is skipped. One note, though: the first step in our documentation might be a bit misleading.

It says 'Back up Citus metadata in the old coordinator node,' but as the section title suggests, this step should actually be performed on every node.

https://docs.citusdata.com/en/stable/admin_guide/upgrading_citus.html

eaydingol avatar Oct 03 '25 08:10 eaydingol

doc change --> https://github.com/citusdata/citus_docs/pull/1139 Both the title above and the code comment below states the need to run all nodes

ihalatci avatar Oct 21 '25 04:10 ihalatci