`tables` column in `sys.snapshots` is missing quotes
CrateDB version
5.3.0
CrateDB setup information
No response
Problem description
Table names in sys.snapshots's tables array column aren't quoted, making it hard to use its values.
Steps to Reproduce
CREATE TABLE doc."I-need-quotes" (a INTEGER);
CREATE REPOSITORY demo_repository TYPE s3 WITH (bucket = '...', base_path = 'debug', access_key = '...', secret_key = '...');
CREATE SNAPSHOT demo_repository.s1 TABLE doc."I-need-quotes";
SELECT tables[1]
FROM sys.snapshots
WHERE name = 's1';
-- returns doc.I-need-quotes
In a situation where I want to automate dropping tables before a restore, I may want to write queries such as:
SELECT 'DROP TABLE ' || t
FROM (
SELECT UNNEST(tables) AS t
FROM sys.snapshots
WHERE name = 's1'
);
-- returns DROP TABLE doc.I-need-quotes (will fail, missing quotes)
-- or
SELECT 'DROP TABLE ' || QUOTE_IDENT(t)
FROM (
SELECT UNNEST(tables) AS t
FROM sys.snapshots
WHERE name = 's1'
);
-- returns DROP TABLE "doc.I-need-quotes" (will fail, incorrectly quoted)
-- only working alternative
SELECT 'DROP TABLE ' || QUOTE_IDENT(SPLIT_PART(t, '.', 1)) || '.' || QUOTE_IDENT(SPLIT_PART(t, '.', 2))
FROM (
SELECT UNNEST(tables) AS t
FROM sys.snapshots
WHERE name = s1'
);
Actual Result
See above.
Expected Result
Return a properly quoted name (doc."I-need-quotes") in tables that can easily be used further. Alternatively, split up table_name and schema_name into separate columns, so QUOTE_IDENT can get applied to them.
Imho we'd want to keep having this table names in sys.snapshots in sync with pg_tables and information_schema.tables , and in those we don't have " when they are need, and we are in sync with postgres:
matriv=> create table "55-tbl-needs-quotes"(a int);
CREATE TABLE
matriv=> select table_name from information_schema.tables where table_name like '55%';
table_name
---------------------
55-tbl-needs-quotes
(1 row)
matriv=> select tablename from pg_tables where tablename like '55%';
tablename
---------------------
55-tbl-needs-quotes
(1 row)
matriv=> select * from 55-tbl-needs-quotes;
ERROR: syntax error at or near "55"
LINE 1: select * from 55-tbl-needs-quotes;
^
matriv=> select * from "55-tbl-needs-quotes";
a
--11-
(0 rows)
However, in your case, @matriv, the schema and table names are returned separately in information_schema.tables/pg_tables. In the sys.snapshots example, a fully qualified name is returned.
I think not applying quoting for individually returned schema/table names are fine. But in case a FQN is returned, it would be great if it was correctly quoted. Otherwise, the FQN needs to be split again into schema/table name by the client to fix it.
It sounds like a valid point, thx for explaining it further!
Hello, noobie contributor here. I'd like to tackle this issue, but am struggling to understand where to start. Doing a file search of sys.snapshots didn't help a ton. Could you give me a little bit of insight on where to start looking?
Thx for your interest @RichardIcecube, the class you want to check is SysSnapshotsTableInfo , there you can see that the method to adjust is SysSnapshot#tables()
I'm not sure if it's a good idea to just add quoting now after the fact. The workaround posted would stop working so it is kinda a breaking change.
Maybe instead we could add an additional object array with table_name and schema_name, similar to how table_partitions already works.
Then users can select what they want and make use of quote_ident if they need the quotes.
@matriv Thank you for helping me locate the points of interest in the program. I will do my best to build an understanding of how the code works.
I apologize for being a bit green, but I'm trying my best to follow along. To clarify what I need to do, @mfussenegger, are you suggesting creating another object array along with the existing one labeled table_partitions where there are additional table_name and schema_name columns?
If so, my current understanding of doing this would involve editing SysSnapshotsTableInfo#create() include an additional object array along with expanding SysSnapshot to accommodate the additional object array.
If I'm mistaken at all on this process, please correct errors
are you suggesting creating another object array along with the existing one labeled table_partitions where there are additional table_name and schema_name columns?
Yes. (But I don't really have a good name for it)
If so, my current understanding of doing this would involve editing SysSnapshotsTableInfo#create() include an additional object array along with expanding SysSnapshot to accommodate the additional object array.
SysSnapshot could probably expose a List<RelationName> (RelationName has both schema and name properties).
See https://github.com/crate/crate/blob/8c7ea46245e832eb1324f2c92be5d32b92611473/server/src/main/java/io/crate/expression/reference/sys/snapshot/SysSnapshot.java#L93-L98
and:
https://github.com/crate/crate/blob/8c7ea46245e832eb1324f2c92be5d32b92611473/server/src/main/java/io/crate/metadata/RelationName.java#L90-L93
Unfortunately, I am a bit out of my depth here. Best of luck to whoever decides to tackle this issue!
no worries @RichardIcecube, and apologies, I shouldn't have rushed to put the good first issue label on this, as this needs first some more careful thought.
Imho we'd want to keep having this table names in sys.snapshots in sync with pg_tables and information_schema.tables , and in those we don't have " when they are need, and we are in sync with postgres:
As far as I am aware, PostgreSQL doesn’t provide fully qualified names anywhere in the system views. While the feature request to include separate schema and table names might make it easier to work with other tables, I still consider the current behaviour in the tables column a bug that should be fixed separately. This currently also leads to a bug in CrateDB Cloud.
While the feature request to include separate schema and table names might make it easier to work with other tables, I still consider the current behaviour in the tables column a bug that should be fixed separately.
The original bug is that we expose a fqn containing both the schema and the table name in one column. This is not correct as a any quote_ident functionality is always about quoting an identifier and not a FQN table name. But for BWC reasons we cannot just drop this column or change it's type/value.