Unambiguous escape pattern for special characters in tablenames.
See the discussion in https://github.com/electric-sql/electric-next/issues/173
The link is broken so i only have the information that is in the title of this issue. I did some tests to see if we an use special characters in table and column names. Everything seems to work properly. This are the things i tested and can confirm work:
- Case sensitivity works for table and column names
- Special characters work (although may need to be escaped in bash, e.g. exclamation mark needs to be escaped:
HTTP GET "localhost:3000/v1/shape/\!fooBar-baz" offset==-1) - Spaces in table names and column names work
Note that any where clause needs to be properly quoted when using special characters and may require additional escaping for special bash characters.
@kevin-dp the link is broken because someone disabled the Issues feature on the old repo. It was about making sure that a fully-qualified table name, i.e. one including the namespace, can be parsed unambiguously even if the namespace or the table name proper has a dot in its name.
@kevin-dp the link is broken because someone disabled the Issues feature on the old repo. It was about making sure that a fully-qualified table name, i.e. one including the namespace, can be parsed unambiguously even if the namespace or the table name proper has a dot in its name.
I see. Currently, we do not support schemas or tables containing a dot in their name.
The validate_table function splits the (qualified) table name on dots but expects at most one dot and if a dot is present it assumes that it delimits the schema from the table name.
So schema.table works as long as schema and table don't contain dots, but these cases don't work:
a.weird.table.name--> error: "table name does not match expected format" because there is more than 1 dotmy.table--> doesn't work if that's the table name because it is interpreted asmybeing the schema andtablebeing the table name
To properly fix this we should support fully qualified table names that use quotes, consistently with how Postgres does it. e.g. foo.bar would be interpreted as schema foo and table bar and "foo"."bar.baz" would be schema foo and table bar.baz
We can distinguish 2 cases for names in Postgres:
- The standard case where names start with a-z or underscore, followed by zero or more letters, digits, or underscores (cf. https://www.postgresql.org/docs/7.0/syntax525.htm).
- Quoted names: any character is valid inside a quoted name, even quotes are possible but they need to be escaped by doubling the quote, e.g.
"!A.weird""Table Name"
The standard case can be validated with the regex: [a-z_][a-zA-Z0-9_]*.
Quoted names can be validated with the regex: "(""|[^"])+".
Now, Postgres table names can be specified in two ways:
- Non-qualified table name, e.g.
fooand"123foo" - Qualified table name
<schema>.<table>, e.g.foo.barand"foo.bar"."baz"
So, we can combine our two regexes from before to validate table names:
^((?<schema>([a-z_][a-zA-Z0-9_]*|"(""|[^"])+"))\.)?(?<table>([a-z_][a-zA-Z0-9_]*|"(""|[^"])+"))$
I'm using named captured groups to fetch the schema and table name.
@kevin-dp I wrote a proper parser for pg table names, including managing case if unquoted etc:
https://github.com/electric-sql/electric-old/blob/main/components/electric/lib/electric/postgres/name_parser.ex
might be useful.
tests:
https://github.com/electric-sql/electric-old/blob/main/components/electric/test/electric/postgres/name_parser_test.exs
Hi @magnetised, is there any case i overlooked that is not handled by the regex and requires a custom parser? From the tests i conducted, the regex seems to properly parse quoted and unquoted table names (fully qualified or not). Here are the tests i wrote: https://github.com/electric-sql/electric/pull/1518/files#diff-4644657568b5299a6644a6246af3e0336cc3d2de067825fe4dd8b07c18a1cda5R124
EDIT: i guess the difference is that your parser converts uppercase letters to lowercase when not quoted and this regex doesn't. I'm not entirely sure if we need it or not. We could say that the root table name you pass to the shape API is case sensitive even when unquoted. If we want to fully mimick Postgres and make it case insensitive when unquoted, we could adapt the implementation to turn uppercase letters into lowercase letters here: https://github.com/electric-sql/electric/pull/1518/files#diff-5787f025bb71bdf62cf499062ac24a2ef70540543bcb53eb3a3f9dc0e64e0b68R220
@kevin-dp @magnetised let's try to close this issue
@kevin-dp awesome regexing. If you've nailed it then no need for the parser, though I always feel safer with a parser personally.
Ok, closing this issue has it has been addressed.