citus icon indicating copy to clipboard operation
citus copied to clipboard

Wrong assert in ruleutils_xx.c

Open Green-Chan opened this issue 1 year ago • 3 comments

In #7379 assert in ruleutils_15.c was changed from

	while (i < colinfo->num_cols && colinfo->colnames[i] == NULL)
		i++;
	Assert(i == colinfo->num_cols);

to

	for (int col_index = 0; col_index < colinfo->num_cols; col_index++)
	{
		/*
		 * In the above processing-loops, "i" advances only if
		 * the column is not new, check if this is a new column.
		 */
		if (colinfo->is_new_col[col_index])
			i++;
	}
	Assert(i == colinfo->num_cols);

(later ruleutils_14.c and ruleutils_16.c were also changed). But that is still wrong. The following script leads to assert failure:

CREATE EXTENSION citus;
CREATE TABLE t1 (a int, b int);
CREATE TABLE t2 (x int, y int);
INSERT INTO t1 VALUES (1, 2), (4, 5);
INSERT INTO t2 VALUES (1, 1), (5, 5);
SELECT create_distributed_table('t2','x');
CREATE VIEW v AS SELECT * FROM t2 JOIN t1 on (t1.a=t2.x);
ALTER TABLE t2 ADD COLUMN z int;
SELECT * FROM v;

Here is the explanation of why assert was changed: https://github.com/citusdata/citus/pull/7379#discussion_r1437366160 It is wrong. is_new_col has nothing to do with colnames. If you take a look at the deparse_columns definition and comments there as a whole, you'll see that is_new_col actually about columns described by new_colnames. Yes, may be that comment should be changed to

Entries with is_new_col false must match the non-NULL new_colnames entries one-for-one. I also spent some time exploring what set_join_column_names() does, and that confirms that is_new_col is about new_colnames, not colnames.

So you should probably return the old assert back and reinvestigate why it fails. One of examples of failure is from multi_alter_table_statements test:

-- Create a reference table
CREATE TABLE tbl_ref_mats(row_id integer primary key);
INSERT INTO tbl_ref_mats VALUES (1), (2);
SELECT create_reference_table('tbl_ref_mats');

-- Create a distributed table
CREATE TABLE tbl_dist_mats(series_id integer);
INSERT INTO tbl_dist_mats VALUES (1), (1), (2), (2);
SELECT create_distributed_table('tbl_dist_mats', 'series_id');

-- Create a view that joins the distributed table with the reference table on the distribution key.
CREATE VIEW vw_citus_views as
SELECT d.series_id FROM tbl_dist_mats d JOIN tbl_ref_mats r ON d.series_id = r.row_id;

-- The view initially works fine
SELECT * FROM vw_citus_views ORDER BY 1;
-- Now, alter the table
ALTER TABLE tbl_ref_mats ADD COLUMN category1 varchar(50);
SELECT * FROM vw_citus_views ORDER BY 1;

From what I could understand, set_join_column_names() initialises colinfo->colnames and noldcolumns from rte->eref->colnames, and it expects that this list contains only old columns (meaning "columns that existed when the query was parsed"). For the above example rte->eref->colnames consists of "series_id", "row_id" and "category1". If you comment out

SELECT create_reference_table('tbl_ref_mats');

then rte->eref->colnames consists of "series_id" and "row_id", as expected.

Looks like rte->eref->colnames is formed wrong for reference table

Green-Chan avatar Apr 26 '24 16:04 Green-Chan

I'm not sure what has changed, but I get this assertion failure just running tests now. I have Ubuntu 22.04, gcc 11.4.0, PostgreSQL REL_16_3 configured with options CFLAGS=" -Og" --enable-tap-tests --enable-debug --with-openssl --with-libxml --enable-cassert --with-icu --with-lz4, Citus main (9e1852ea). When running make check-enterprise I get single_node_enterprise test failed and line

TRAP: failed Assert("i == colinfo->num_cols"), File: "deparser/ruleutils_16.c", Line: 1592, PID: 58606

in tmp_check/master/log/postmater.log

Green-Chan avatar Aug 15 '24 08:08 Green-Chan