[SPARK-49119][SQL] Fix the inconsistency of syntax `show columns` between v1 and v2
What changes were proposed in this pull request?
The pr aims to
- fix the
inconsistencyof syntaxshow columnsbetweenv1andv2. - assign a name
SHOW_COLUMNS_WITH_CONFLICT_NAMESPACEto the error condition_LEGACY_ERROR_TEMP_1057. - unify v1 and v2
SHOW COLUMNS ...tests. - move some UT related to
SHOW COLUMNSfromDDLSuitetocommand/ShowColumnsSuiteBaseorv1/ShowColumnsSuiteBase. - move some UT related to
SHOW COLUMNSfromDDLParserSuiteandErrorParserSuitetoShowColumnsParserSuite.
Why are the changes needed?
In AstBuilder, we have a comment that explains as follows:
https://github.com/apache/spark/blob/2a752105091ef95f994526b15bae2159657c8ed0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L5054-L5055
However, in our v2 of the syntax show columns implementation, we did not perform the above checks, as shown below:
withNamespaceAndTable("ns", "tbl") { t =>
sql(s"CREATE TABLE $t (col1 int, col2 string) $defaultUsing")
sql(s"SHOW COLUMNS IN $t IN ns1")
}
-
Before (inconsistent, v1 will fail, but v2 will success) v1:
[SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`.v2:
Execute successfully.
so, we should fix it.
-
After (consistent, v1 & v2 all will fail) v1:
[SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`.v2:
[SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`.
Does this PR introduce any user-facing change?
Yes, for v2 tables, in syntax SHOW COLUMNS {FROM | IN} {tableName} {FROM | IN} {namespace}, if the namespace (second parameter) is different from the namespace of the table(first parameter), the command will succeed without any awareness before this PR, after this PR, it will report an error.
How was this patch tested?
Add new UT.
Was this patch authored or co-authored using generative AI tooling?
No.
cc @cloud-fan @yaooqinn
In my opinion, when constructing the AST, we should always prepend the "ns" identifier for the table identifier. This is because the syntax states that "a.b.c" is IN/FROM "x.y.z". I can't help but think of concatenating them as "x.y.z.a.b.c" instead of zipping. Currently, we only do this for tables without a specified namespace field.
In my opinion, we should not support SHOW COLUMNS t FROM ... in the first place. A fully qualified table name should do the work, there is no need for the FROM clause. But the ship is already sailed and this is what it is now.
In my opinion, we should not support
SHOW COLUMNS t FROM ...in the first place. A fully qualified table name should do the work, there is no need for the FROM clause. But the ship is already sailed and this is what it is now.
FROM and IN are interchangeable. Identifying the tbl or ns is based on the order not FROM and IN keywords, and the 2nd FROM and IN clause is identified as ns
You can identify a table with the qualified name. This is a common syntax in many other commands. SHOW COLUMNS FROM t FROM schema has no benefit but welcomes confusion.
So, I recommend prepending it in the first step for the relation resolution, which can somewhat decrease the confusion
- Yes, in the syntax
SHOW COLUMNS ... {FROM | IN} ... {FROM | IN} ..., the second clause{FROM | IN} ..is actually unnecessary, but due to historical reasons, if we change it to ``SHOW COLUMNS ... {FROM | IN} ..., it is abreak change` (at least when the end-user run it, it will generate an error). - The modification here is only to align this syntax in v1 and v2,
not checkingcan cause confusion for end-users, eg in the following scenario:
SHOW COLUMNS in ns.tbl in ns2.
... which can somewhat decrease the confusion
Yes, but I don't think we should make a breaking change only to reduce confusion for a legacy/unrecommanded feature.
make a breaking change
I acknowledge that it will cause a breaking change for a corner case for V1 tables, SHOW COLUMNS IN DB.TBL IN DB.
@panbingkun can you update Does this PR introduce any user-facing change? I think it does affect v2 tables.
@panbingkun can you update
Does this PR introduce any user-facing change? I think it does affect v2 tables.
Okay
@panbingkun can you update
Does this PR introduce any user-facing change? I think it does affect v2 tables.
Updated.
since it introduces a new error, shall we add a legacy config?
since it introduces a new error, shall we add a legacy config?
Okay, let me to add it.
since it introduces a new error, shall we add a legacy config?
Updated. (spark.sql.legacy.showColumnsCheckNamespace)
ShowColumnsExec was supported a few days ago, what legacy behavior do we have to restore?
@yaooqinn you are right! We don't need a config then :)
@yaooqinn you are right! We don't need a config then :)
Okay, updated!
Merged into master. Thanks @panbingkun @cloud-fan and @yaooqinn
Merged into master. Thanks @panbingkun @cloud-fan and @yaooqinn
Thanks all @LuciferYang @cloud-fan @yaooqinn