docs icon indicating copy to clipboard operation
docs copied to clipboard

Bug Report

Open MyKWK opened this issue 1 year ago • 0 comments

BUG Report

TiDB 8.3.0

Original Query

SELECT ALL l_extendedprice, l_orderkey, l_discount
FROM lineitem
WHERE (coalesce(l_quantity, l_suppkey, 0.21501538554113775, l_receiptdate)) <= l_returnflag
GROUP BY l_extendedprice, l_orderkey, l_discount;

Return: 6005 rows

Rewritten Query

SELECT l_extendedprice, l_orderkey, l_discount
FROM lineitem
WHERE CASE
    WHEN l_quantity IS NOT NULL THEN CAST(l_quantity AS decimal(15,2))
    WHEN l_suppkey IS NOT NULL THEN CAST(l_suppkey AS unsigned)
    ELSE 0.21501538554113775
END <= l_returnflag
GROUP BY l_orderkey, l_extendedprice, l_discount

Return: 0 rows


Analysis

1. Logical Equivalence of the Queries

The original and rewritten queries are intended to be semantically equivalent:

  • Both select l_extendedprice, l_orderkey, and l_discount from the lineitem table.
  • Both filter rows by comparing the first non-NULL value among l_quantity, l_suppkey, 0.21501538554113775, and l_receiptdate to l_returnflag.

- Original Query (using COALESCE):

SELECT ALL l_extendedprice, l_orderkey, l_discount
FROM lineitem
WHERE (coalesce(l_quantity, l_suppkey, 0.21501538554113775, l_receiptdate)) <= l_returnflag
GROUP BY l_extendedprice, l_orderkey, l_discount;

- Rewritten Query (using CASE):

SELECT l_extendedprice, l_orderkey, l_discount
FROM lineitem
WHERE CASE
    WHEN l_quantity IS NOT NULL THEN CAST(l_quantity AS decimal(15,2))
    WHEN l_suppkey IS NOT NULL THEN CAST(l_suppkey AS unsigned)
    ELSE 0.21501538554113775
END <= l_returnflag
GROUP BY l_orderkey, l_extendedprice, l_discount

Expected Behavior

Logically, these two queries should produce the same result set for all rows, as the rewritten query mimics the logic of COALESCE with CASE WHEN.


2. Bug Manifestation

  • Original Query result: 6005 rows
  • Rewritten Query result: 0 rows

This large discrepancy indicates that the semantic equivalence between the queries does not hold in TiDB 8.3.0 for this case.


3. Potential Cause

  • Column Order in GROUP BY:
    The original query uses GROUP BY l_extendedprice, l_orderkey, l_discount, whereas the rewritten query uses GROUP BY l_orderkey, l_extendedprice, l_discount.
    While standard SQL considers group column order insignificant, some engines may produce different plans if the order is changed.

  • Type Casting Differences:

    • COALESCE returns the first non-NULL value as is, maintaining its type.
    • The rewritten CASE casts l_quantity to decimal(15,2) and l_suppkey to unsigned.
      If the types differ from those in the original, it may change comparison logic or evaluation, causing filter mismatches.
  • Implicit vs Explicit Null Handling:

    • COALESCE and CASE should be functionally similar, but differences in how TiDB handles type conversions and nullability can result in different evaluation orders or filter results.
  • Edge Case: l_receiptdate Ignored in Rewritten Query:

    • In the rewritten query, l_receiptdate is omitted, whereas it is the last fallback in the COALESCE. This means when all other columns are NULL, the rewritten query will not use l_receiptdate, while the original will.

How to repeat ?

The exported file for the database is in the attachment. : (https://github.com/LLuopeiqi/newtpcd/blob/main/tidb/tpcd.sql) .

MyKWK avatar Nov 20 '24 14:11 MyKWK