citus icon indicating copy to clipboard operation
citus copied to clipboard

Fix downgrade and following upgrade

Open manaldush opened this issue 8 months ago • 7 comments

In case of citus downgrade and further upgrade citus crash with core dump. The reason is that citus hardcoded number of columns in pg_dist_partition table, but in case of downgrade and following update table can have more columns, and some of then can be marked as dropped.

Patch suggest decision for this problem with using tupleDescriptor->nattrs(postgres internal approach).

Fixes #7933.

manaldush avatar Apr 03 '25 15:04 manaldush

@microsoft-github-policy-service agree

manaldush avatar Apr 03 '25 15:04 manaldush

Codecov Report

:x: Patch coverage is 61.38614% with 39 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 36.48%. Comparing base (c183634) to head (f07741b). :warning: Report is 1 commits behind head on main.

:x: Your patch status has failed because the patch coverage (61.38%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage. :x: Your project status has failed because the head coverage (36.48%) is below the target coverage (87.50%). You can increase the head coverage or adjust the target coverage.

:exclamation: There is a different number of reports uploaded between BASE (c183634) and HEAD (f07741b). Click for more details.

HEAD has 72 uploads less than BASE
Flag BASE (c183634) HEAD (f07741b)
15_17_upgrade 1 0
15_16_upgrade 1 0
15_regress_check-columnar-isolation 1 0
17_regress_check-columnar-isolation 1 0
16_regress_check-columnar-isolation 1 0
16_regress_check-follower-cluster 1 0
17_regress_check-follower-cluster 1 0
15_regress_check-follower-cluster 1 0
17_regress_check-query-generator 1 0
15_regress_check-enterprise-isolation-logicalrep-2 1 0
15_regress_check-enterprise-isolation-logicalrep-3 1 0
17_regress_check-enterprise-isolation-logicalrep-2 1 0
15_regress_check-columnar 1 0
17_regress_check-columnar 1 0
17_regress_check-split 1 0
16_regress_check-query-generator 1 0
16_regress_check-enterprise-isolation-logicalrep-3 1 0
15_regress_check-enterprise-failure 1 0
16_regress_check-split 1 0
16_regress_check-columnar 1 0
17_regress_check-enterprise-isolation-logicalrep-3 1 0
15_regress_check-split 1 0
17_regress_check-enterprise-failure 1 0
16_regress_check-enterprise-failure 1 0
16_regress_check-enterprise-isolation-logicalrep-2 1 0
15_regress_check-query-generator 1 0
16_regress_check-enterprise 1 0
15_regress_check-vanilla 1 0
17_regress_check-enterprise 1 0
17_regress_check-vanilla 1 0
16_regress_check-vanilla 1 0
15_regress_check-enterprise 1 0
15_regress_check-enterprise-isolation 1 0
17_regress_check-enterprise-isolation 1 0
15_cdc_installcheck 1 0
16_cdc_installcheck 1 0
16_arbitrary_configs_5 1 0
16_arbitrary_configs_1 1 0
17_arbitrary_configs_1 1 0
15_arbitrary_configs_1 1 0
15_arbitrary_configs_3 1 0
16_arbitrary_configs_3 1 0
17_arbitrary_configs_5 1 0
16_arbitrary_configs_2 1 0
15_arbitrary_configs_2 1 0
17_arbitrary_configs_2 1 0
15_regress_check-multi-1 1 0
16_regress_check-failure 1 0
15_arbitrary_configs_5 1 0
17_arbitrary_configs_3 1 0
16_regress_check-enterprise-isolation 1 0
17_regress_check-operations 1 0
16_regress_check-enterprise-isolation-logicalrep-1 1 0
17_regress_check-multi-mx 1 0
15_regress_check-enterprise-isolation-logicalrep-1 1 0
16_regress_check-multi-mx 1 0
17_regress_check-enterprise-isolation-logicalrep-1 1 0
15_regress_check-multi-mx 1 0
15_regress_check-failure 1 0
17_regress_check-failure 1 0
16_regress_check-isolation 1 0
17_regress_check-isolation 1 0
15_regress_check-multi 1 0
16_regress_check-multi 1 0
17_regress_check-multi 1 0
17_arbitrary_configs_0 1 0
16_arbitrary_configs_4 1 0
16_regress_check-multi-1 1 0
17_arbitrary_configs_4 1 0
17_regress_check-multi-1 1 0
16_arbitrary_configs_0 1 0
15_arbitrary_configs_0 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #7950       +/-   ##
===========================================
- Coverage   89.21%   36.48%   -52.73%     
===========================================
  Files         284      284               
  Lines       61876    61669      -207     
  Branches     7746     7674       -72     
===========================================
- Hits        55201    22500    -32701     
- Misses       4458    36576    +32118     
- Partials     2217     2593      +376     
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 04 '25 12:04 codecov[bot]

@onurctirtir re-run make reindent and commit changes

manaldush avatar Apr 07 '25 13:04 manaldush

@onurctirtir re-run make reindent and commit changes

Thank you! Let's wait for others to chime in a bit.

onurctirtir avatar Apr 07 '25 13:04 onurctirtir

Oh, seems some of the regression tests are failing. Could you look to see why it's the case @manaldush?

From https://github.com/citusdata/citus/actions/runs/14310120419?pr=7950, as an example, multi_cluster_management is failing. You can run this test locally as below (we of course need to fix others too):

cd src/test/regress/

# one time setup begins
pip install pipenv
pipenv --rm
pipenv install
# one time setup ends

pipenv shell

citus_tests/run_test.py multi_cluster_management --use-base-schedule --use-whole-schedule-line

Also see src/test/regress/README.md.

onurctirtir avatar Apr 07 '25 15:04 onurctirtir

Oh, seems some of the regression tests are failing. Could you look to see why it's the case @manaldush?

From https://github.com/citusdata/citus/actions/runs/14310120419?pr=7950, as an example, multi_cluster_management is failing. You can run this test locally as below (we of course need to fix others too):

cd src/test/regress/

# one time setup begins
pip install pipenv
pipenv --rm
pipenv install
# one time setup ends

pipenv shell

citus_tests/run_test.py multi_cluster_management --use-base-schedule --use-whole-schedule-line

Also see src/test/regress/README.md.

@onurctirtir, found problem and fixed

manaldush avatar Apr 08 '25 13:04 manaldush

@onurctirtir @manaldush I tried the patch on ubuntu 22.04 on arm (UTM virtuial machine on M1 Mac).

It surely behaves better than before. At least multi_extension can pass on some runs. But it still fails on some of the runs when I run the test repeatedly using:

cd src/test/regress
pipenv run citus_tests/run_test.py multi_extension --use-base-schedule --use-whole-schedule-line

When I look at the regression.diffs file, the reason looks like the same as before: server crashing.

+SSL SYSCALL error: EOF detected
+connection to server was lost

alperkocatas avatar Apr 25 '25 20:04 alperkocatas