spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-49119][SQL] Fix the inconsistency of syntax `show columns` between v1 and v2

Open panbingkun opened this issue 1 year ago • 18 comments

What changes were proposed in this pull request?

The pr aims to

  • fix the inconsistency of syntax show columns between v1 and v2.
  • assign a name SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE to the error condition _LEGACY_ERROR_TEMP_1057.
  • unify v1 and v2 SHOW COLUMNS ... tests.
  • move some UT related to SHOW COLUMNS from DDLSuite to command/ShowColumnsSuiteBase or v1/ShowColumnsSuiteBase.
  • move some UT related to SHOW COLUMNS from DDLParserSuite and ErrorParserSuite to ShowColumnsParserSuite.

Why are the changes needed?

In AstBuilder, we have a comment that explains as follows: https://github.com/apache/spark/blob/2a752105091ef95f994526b15bae2159657c8ed0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L5054-L5055 However, in our v2 of the syntax show columns implementation, we did not perform the above checks, as shown below:

  withNamespaceAndTable("ns", "tbl") { t =>
      sql(s"CREATE TABLE $t (col1 int, col2 string) $defaultUsing")
      sql(s"SHOW COLUMNS IN $t IN ns1")
    }
  • Before (inconsistent, v1 will fail, but v2 will success) v1:

    [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`.
    

    v2:

    Execute successfully.
    

so, we should fix it.

  • After (consistent, v1 & v2 all will fail) v1:

    [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`.
    

    v2:

    [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`.
    

Does this PR introduce any user-facing change?

Yes, for v2 tables, in syntax SHOW COLUMNS {FROM | IN} {tableName} {FROM | IN} {namespace}, if the namespace (second parameter) is different from the namespace of the table(first parameter), the command will succeed without any awareness before this PR, after this PR, it will report an error.

How was this patch tested?

Add new UT.

Was this patch authored or co-authored using generative AI tooling?

No.

panbingkun avatar Aug 06 '24 08:08 panbingkun

cc @cloud-fan @yaooqinn

panbingkun avatar Aug 06 '24 09:08 panbingkun

In my opinion, when constructing the AST, we should always prepend the "ns" identifier for the table identifier. This is because the syntax states that "a.b.c" is IN/FROM "x.y.z". I can't help but think of concatenating them as "x.y.z.a.b.c" instead of zipping. Currently, we only do this for tables without a specified namespace field.

yaooqinn avatar Aug 07 '24 03:08 yaooqinn

In my opinion, we should not support SHOW COLUMNS t FROM ... in the first place. A fully qualified table name should do the work, there is no need for the FROM clause. But the ship is already sailed and this is what it is now.

cloud-fan avatar Aug 07 '24 03:08 cloud-fan

In my opinion, we should not support SHOW COLUMNS t FROM ... in the first place. A fully qualified table name should do the work, there is no need for the FROM clause. But the ship is already sailed and this is what it is now.

FROM and IN are interchangeable. Identifying the tbl or ns is based on the order not FROM and IN keywords, and the 2nd FROM and IN clause is identified as ns

yaooqinn avatar Aug 07 '24 03:08 yaooqinn

You can identify a table with the qualified name. This is a common syntax in many other commands. SHOW COLUMNS FROM t FROM schema has no benefit but welcomes confusion.

cloud-fan avatar Aug 07 '24 04:08 cloud-fan

So, I recommend prepending it in the first step for the relation resolution, which can somewhat decrease the confusion

yaooqinn avatar Aug 07 '24 05:08 yaooqinn

  • Yes, in the syntax SHOW COLUMNS ... {FROM | IN} ... {FROM | IN} ..., the second clause {FROM | IN} .. is actually unnecessary, but due to historical reasons, if we change it to ``SHOW COLUMNS ... {FROM | IN} ..., it is a break change` (at least when the end-user run it, it will generate an error).
  • The modification here is only to align this syntax in v1 and v2, not checking can cause confusion for end-users, eg in the following scenario:
SHOW COLUMNS in ns.tbl in ns2.

panbingkun avatar Aug 07 '24 06:08 panbingkun

... which can somewhat decrease the confusion

Yes, but I don't think we should make a breaking change only to reduce confusion for a legacy/unrecommanded feature.

cloud-fan avatar Aug 07 '24 06:08 cloud-fan

make a breaking change

I acknowledge that it will cause a breaking change for a corner case for V1 tables, SHOW COLUMNS IN DB.TBL IN DB.

yaooqinn avatar Aug 07 '24 06:08 yaooqinn

@panbingkun can you update Does this PR introduce any user-facing change? I think it does affect v2 tables.

cloud-fan avatar Aug 07 '24 09:08 cloud-fan

@panbingkun can you update Does this PR introduce any user-facing change? I think it does affect v2 tables.

Okay

panbingkun avatar Aug 07 '24 09:08 panbingkun

@panbingkun can you update Does this PR introduce any user-facing change? I think it does affect v2 tables.

Updated.

panbingkun avatar Aug 07 '24 09:08 panbingkun

since it introduces a new error, shall we add a legacy config?

cloud-fan avatar Aug 08 '24 02:08 cloud-fan

since it introduces a new error, shall we add a legacy config?

Okay, let me to add it.

panbingkun avatar Aug 08 '24 03:08 panbingkun

since it introduces a new error, shall we add a legacy config?

Updated. (spark.sql.legacy.showColumnsCheckNamespace)

panbingkun avatar Aug 08 '24 06:08 panbingkun

ShowColumnsExec was supported a few days ago, what legacy behavior do we have to restore?

yaooqinn avatar Aug 15 '24 12:08 yaooqinn

@yaooqinn you are right! We don't need a config then :)

cloud-fan avatar Aug 15 '24 13:08 cloud-fan

@yaooqinn you are right! We don't need a config then :)

Okay, updated!

panbingkun avatar Aug 15 '24 14:08 panbingkun

Merged into master. Thanks @panbingkun @cloud-fan and @yaooqinn

LuciferYang avatar Aug 30 '24 08:08 LuciferYang

Merged into master. Thanks @panbingkun @cloud-fan and @yaooqinn

Thanks all @LuciferYang @cloud-fan @yaooqinn

panbingkun avatar Aug 30 '24 08:08 panbingkun