dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Loosen identifier validation to match MySQL

Open fulghum opened this issue 1 year ago • 2 comments

We should loosen identifier validation to match MySQL's behavior.

Context from: https://github.com/dolthub/dolt/pull/3779

MySQL's only rules for identifier names are for when they are unquoted. If they quote the identifier they can use any unicode character except null.

I don't see any reason to restrict names except that they must have at least one non-whitespace character in them. Is there a technical reason for being stricter than MySQL here? Something in noms?

Permitted characters in quoted identifiers include the full Unicode Basic Multilingual Plane (BMP), except U+0000:

ASCII: U+0001 .. U+007F

Extended: U+0080 .. U+FFFF

ASCII NUL (U+0000) and supplementary characters (U+10000 and higher) are not permitted in quoted or unquoted identifiers.

fulghum avatar Jul 08 '22 22:07 fulghum

@JCOR11599 did you fix this in your reserved words PR?

timsehn avatar Aug 12 '22 00:08 timsehn

My reserved and unreserved words PRs mostly solve this; they were focused on preventing reserved words in MySQL from parsing in Dolt without backquotes, allowing unreserved words in MySQL to parse in Dolt without backquotes, and not requiring backquotes for any keywords (reserved or unreserved) when using them in a qualified context.

For example, after my changes this is a now a valid query in dolt

dolt> create table `create`(`create` int);
dolt> select create.create from `create`;
+--------+
| create |
+--------+
+--------+

I wasn't able to get this to work for all reserved and unreserved words, but we are significantly closer than before. The reserved words:

"all"
"distinct"
"div"
"key"
"select"
"sql_calc_found_rows"
"straight_join"
"when"

still need to be backquotes in even if you're using them as part of a qualified identifier. A few words are also reserved in dolt, when they should be unreserved, and viceversa. I have tests in vitess that document all the inconsistencies.

As far as I can tell, the original list of reserved and unreserved words from vitess were pretty arbitrary; they might've just caused too many conflicts, so they just placed them wherever it worked.

jycor avatar Aug 12 '22 21:08 jycor

This is fixed in the latest release. We no longer validate table names beyond what mysql requires.

zachmu avatar Jun 28 '23 18:06 zachmu