gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Over-eager constraint exclusion

Open hlinnaka opened this issue 4 years ago • 7 comments

Greenplum version or build

Tested on GPDB 6, all versions are presumably affected.

Expected behavior

-- Create test table with one partition, for values equal to '1'.
create table parttab (n numeric, t text)
  partition by list (n)(partition one values ('1'));

-- Insert three rows. They're all equal to '1', but different number of zeros after decimal point.
insert into parttab values
  ('1', 'one'),
  ('1.0', 'one point zero'),
  ('1.00', 'one point zero zero');

-- select rows whose text representation is three characters long. This should return the '1.0' row.
select * from parttab where length(n::text) = 3;
  n  |       t        
-----+----------------
 1.0 | one point zero
(1 row)

Actual behavior

postgres=# select * from parttab where length(n::text) = 3;
 n | t 
---+---
(0 rows)

hlinnaka avatar Jun 11 '20 20:06 hlinnaka

The culprit is in the simple_equality_predicate_refuted() function:

/**
 * Check to see if the predicate is expr=constant or constant=expr. In that case, try to evaluate the clause
 *   by replacing every occurrence of expr with the constant.  If the clause can then be reduced to FALSE, we
 *   conclude that the expression is refuted
 *
 * Returns true only if evaluation is possible AND expression is refuted based on evaluation results
 *
 * MPP-18979:
 * This mechanism cannot be used to prove implication. One example expression is
 * "F(x)=1 and x=2", where F(x) is an immutable function that returns 1 for any input x.
 * In this case, replacing x with 2 produces F(2)=1 and 2=2. Although evaluating the resulting
 * expression gives TRUE, we cannot conclude that (x=2) is implied by the whole expression.
 *
 */
static bool
simple_equality_predicate_refuted(Node *clause, Node *predicate)

Partition constraint: (n = '1'::numeric) Qual in query: (length(n::text) = 3)

It substitutes 'n' in the Qual with '1'::numeric from the partition constraint, and evaluates (length('1'::numeric::text) = 3) to false. But that substitution is not OK. There can be values other than '1' in the table, like '1.00', which are equal to '1', but are not the same value.

That substitution is valid for some data types, like integers, but we don't know it in general. I'm not sure what to call that property.

hlinnaka avatar Jun 11 '20 21:06 hlinnaka

I think this is the same bug for https://github.com/greenplum-db/gpdb/issues/8582

For that issue, I have a comment before in my local branch:

    simple_equality_predicate_refuted is implemented by Greenplum,
    Postgres does not contain such function. It is useful in GPDB
    to do partition pruning. And partition key cannot be null except
    default partition (which means partition table innately contain a
    constrain partition_key is not null for non-default leaves.

    Also notice that constrain `a = 1` does not reject the tuple `null`.
    For non partition tables, we do more test by set a to null, so that

    create table t(a int check(a = 1));
    insert into t values (null);
    select * from t where a is null;

    will not generate dummy plan.

But I do not continue to finish that work....

kainwen-zz avatar Jun 11 '20 23:06 kainwen-zz

Partition constraint: (n = '1'::numeric) Qual in query: (length(n::text) = 3)

It substitutes 'n' in the Qual with '1'::numeric from the partition constraint, and evaluates (length('1'::numeric::text) = 3) to false. But that substitution is not OK. There can be values other than '1' in the table, like '1.00', which are equal to '1', but are not the same value.

That substitution is valid for some data types, like integers, but we don't know it in general. I'm not sure what to call that property.

Good catch. Yes, this is the issue.

Do we have any idea how to fix it?

As a workaround, how about we constrain the function (simple_equality_predicate_refuted) to only work with input datatypes that we know we are safe with (such as integers)?

guofengrichard avatar Jun 15 '20 09:06 guofengrichard

On 15/06/2020 12:45, Richard Guo wrote:

Partition constraint: (n = '1'::numeric)
Qual in query: (length(n::text) = 3)

It substitutes 'n' in the Qual with '1'::numeric from the partition
constraint, and evaluates (length('1'::numeric::text) = 3) to false.
But that substitution is not OK. There can be values other than '1'
in the table, like '1.00', which are equal to '1', but are not the
same value.

That substitution is valid for some data types, like integers, but
we don't know it in general. I'm not sure what to call that property.

Good catch. Yes, this is the issue.

Do we have any idea how to fix it?

Not really. Rip it out...

As a workaround, how about we constrain the function (|simple_equality_predicate_refuted|) to only work with input datatypes that we know we are safe with (such as integers)?

Yeah, that would work. It's a bit disappointing, but it's probably the best we can do in stable branches.

  • Heikki

hlinnaka avatar Jun 15 '20 09:06 hlinnaka

As a workaround, how about we constrain the function (|simple_equality_predicate_refuted|) to only work with input datatypes that we know we are safe with (such as integers)?

I have some concern.

Postgres's partition table support user-defined opclass, which means user can determine the equality as it will. (Greenplum does not support yet, but it will be soon after we merge to pg12).

I am not sure if the implementation of partition table changes this after merging to pg12.

For example:

zlv=# CREATE TABLE ptop_test (a int, b int, c int) PARTITION BY LIST (a abs_int_btree_ops);
CREATE TABLE
zlv=# CREATE TABLE ptop_test_0 PARTITION OF ptop_test FOR VALUES IN ('0');
CREATE TABLE
zlv=# insert into ptop_test(-1, -1, -1)
zlv=# CREATE TABLE ptop_test_1 PARTITION OF ptop_test FOR VALUES IN ('1');
CREATE TABLE
zlv=# insert into ptop_test values (-1, 1, 1);
INSERT 0 1
zlv=# select * from ptop_test;
 a  | b | c
----+---+---
 -1 | 1 | 1
(1 row)

zlv=# insert into ptop_test values (1, 1, 1);
INSERT 0 1
zlv=# select * from ptop_test;
 a  | b | c
----+---+---
 -1 | 1 | 1
  1 | 1 | 1
(2 rows)

zlv=# select * from ptop_test_1;
 a  | b | c
----+---+---
 -1 | 1 | 1
  1 | 1 | 1
(2 rows)

The above case is based on postgres 12. You can see 1 and -1 are both in the same leaf partition which is just the same as the case in this issue.

So if after merging, we still have the function simple_equality_predicate_refuted , we should face the above issue, and I do not think we can make any conclusion.

So for master branch:

  1. if we keep simple_equality_predicate_refuted, we should remove it. 2.if we remove it during pg 12 merge, we might do not care it for master branch.

Any ideas?

kainwen-zz avatar Jun 15 '20 12:06 kainwen-zz

The above case is based on postgres 12. You can see 1 and -1 are both in the same leaf partition which is just the same as the case in this issue.

Quite right. If we go with the whitelist approach, it needs to be based on the equality operator, not the datatype.

hlinnaka avatar Jun 16 '20 09:06 hlinnaka

The above case is based on postgres 12. You can see 1 and -1 are both in the same leaf partition which is just the same as the case in this issue.

Quite right. If we go with the whitelist approach, it needs to be based on the equality operator, not the datatype.

Hi @hlinnaka I do not think any whitelist method can work. It is going to the wrong direction to add whitelist.

The root issue is:

  1. leaf partition t_part has a constrain (a = const in some type)
  2. we can insert many values into that leaf partition, say a1, a2, a3 unless they obey the partition's constrain's (equality)

select * from t where my_udf(a) = xxxx;

my_udf is a UDF that included from some lib or something else, it may never invoke equality operator. For example, it may just be MD5 calculate function.

Can we guarantee the UDF my_udf has the following property: for any a1, a2, a3 that obeys the constrain (a = const in some type) under the specific equality my_udf(a1) = my_udf(a2) = my_udf(a3) when eval a qual?

I do not think we or the user can guarantee the property.


My idea for fixing this issue is:

  1. totally remove the logic introduced by gpdb simple_equality_predicate_refuted
  2. if I remember correctly, there is only one case failing after the removal, let's see what can we do for it:
explain select * from ds_4 where case when month_id = '200800' then NULL else 2 end IS NULL;

kainwen-zz avatar Jun 17 '20 03:06 kainwen-zz

@kainwen What is pending on the issue?

dpandhi-git avatar Nov 17 '22 23:11 dpandhi-git

Short version:

  • for main branch: remove simple_equality_predicate_refuted and all related code
  • for 6x: do nothing (until later we hear real complains).

the close this issue.


TL; DR

Let me summarize the issue now.

The culprit is in the simple_equality_predicate_refuted() function:

As Heikki mentioned above, this Greenplum's own added functions lead to the bug. If we remove them, then this issue is gone (Both 6X and main).

But the removal of simple_equality_predicate_refuted and all related code, it will fail a test case of partition pruning:

explain select * from ds_4 where case when month_id = '200800' then NULL else 2 end IS NULL

NOTE: There is no problem in Greenplum main branch. Because after merging to Postgres v12, the implementation of partition table changes a lot. Main branch does not implement partition use constraint any more. And with the main branch (top commit cee9e392401912226f85312d0a9cc99601946193), it does not prune partition for the above SQL and this is the behavior of Postgres.

The abstract model of the issue is

  • given a value set defined by some eqclass
    • eq constraint, or
    • gpdb main branch's leaf partition def
  • given a bool type expression exp

Answer the question: will exp always be false?

GPDB main branch, partition table's part key can be defined with user defined opclass, this almost make the problem impossible to resolve: it enlarges the eq class set of the "same" value.

To accurately resolve the problem, we have to write a new interpreter to implement the new semantics.

kainwen avatar Nov 28 '22 13:11 kainwen

Fixed by https://github.com/greenplum-db/gpdb/pull/14553 in main branch. Close it. (for 6x do nothing until we hear real complains).

@zxuejing I recommend to add the case in this issue to main branch also. The above PR is pushed, lets open a new one to add a case. Thanks a lot!

kainwen avatar Dec 01 '22 12:12 kainwen