greptimedb
greptimedb copied to clipboard
Compatible with Postgres system catalog
What type of enhancement is this?
User experience
What does the enhancement do?
Since we want to support users to access GreptimeDB with psql, we'd better be more compatible when possible.
Currently, when you run psql -l
against GreptimeDB, it would fail with:
psql: ERROR: Failed to plan SQL: Error during planning: Table not found: greptime.pg_catalog.pg_database
Implementation challenges
This should be similar to MySQL's information
Not quite easy. Seems related to some internal functions also:
Failed to execute query: SELECT d.datname as "Name",
pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
pg_catalog.array_to_string(d.datacl, '\n') AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1; err=0: Table operation error, at src/frontend/src/instance.rs:272:14
1: Failed to plan statement, at /Users/tison/GreptimeWorkspace/greptimedb/src/operator/src/statement.rs:253:14
2: Failed to plan SQL, at src/query/src/planner.rs:90:14
...
And this is the query for "show tables" (\dt
):
Failed to execute query: SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 't' THEN 'TOAST table' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned index' END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','p','')
AND n.nspname <> 'pg_catalog'
AND n.nspname !~ '^pg_toast'
AND n.nspname <> 'information_schema'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2; err=0: Table operation error, at src/frontend/src/instance.rs:272:14
1: Failed to plan statement, at /Users/tison/GreptimeWorkspace/greptimedb/src/operator/src/statement.rs:253:14
2: Failed to plan SQL, at src/query/src/planner.rs:90:14
I did some research about this before, it needs a lot work, and most of them had been done by risingwave, maybe you can use some code from them, and make less effort for this.
FYI: you can refer this issue. some of them can easily add in greptime others maybe need to add in pgwire
https://github.com/risingwavelabs/risingwave/issues/2954
Yes, we are doing a lot of work to be compatible with MySQL, pg is our next goal.
Yes, we are doing a lot of work to be compatible with MySQL, pg is our next goal.
One thing came to my mind that databend, risgingwave and greptime all using rust and all maybe have the need to implement protocols risingwave mainly for pg, and databend mainly for mysql, maybe you can extract something common and help each other, maybe its a triple-win thing.
Let's add pgwrite into databend!
I would like to get involved in this work.😊
@J0HN50N133 Welcome! You can take a look at how we implement the "information_schema" tables as a reference.
But I'm also still unaware of how to add the necessary functions or work it around.
@tisonkun
BELOW IS ALSO THE PROPOSAL SUMITTED TO OSPP
To support pg_catalog
related features. The main challenge is to support the System Information Functions and Operators.
I will try to add a new schema called pg_catalog
and related table for trivial cases like \d
under src/catalog/pg_catalog
and decide how to implement the system function according to what happen in DFLogicalPlanner
. Related system functions may be implemented via udf
.
Although SHOW TABLES
and SHOW DATABASES
can also work even under psql
, pg users may still depend on pg_catalog
Goals
- support normal system catalog related query(trivial statements first)
- pg_catalog should be optional, users shouldn't pay for this if they don't need pg. Data should be able to build from information_schema.
System Catalogs implementation plan
In the current version, tables following are possible to support:
- pg_database
- pg_namespace
- pg_class
- pg_trigger
- pg_collation
- pg_trigger
- pg_partitioned_table
- pg_statistic
The first three tables will be implemented first since they are basis.
Implementation
pg_catalog management
-
add a new schema called
pg_catalog
and related table for trivial cases like\d
undersrc/catalog/pg_catalog
. Create a new SchemaProvider calledPgCatalogProvider
and mount it underSystemCatalog
. Then while finding table for query,catalog_manager
will first checksystem_catalog
. https://github.com/GreptimeTeam/greptimedb/blob/65d47bab5607b60aa25c50f57e53b762fe979c68/src/catalog/src/kvbackend/manager.rs#L252 -
Each System table is implemented as a struct under
pg_catalog/
. To make them work as aTable
, we just need to do some adaption between those implementation of pg_catalog.tables andTable
. -
Since view is supported recently in #3807 , view related catalog function should also be supported. The
id
of view is also implemented asTableId
in #3807. So we could reuseTableId
asoid
for view as we will do for table. -
There is currently no concept of
Namespace
in greptime; to workaround, it can be considered equivalent to adatabase
, i.e. each database contains only one namespace. To provide a comparison, the data organization in greptime and postgres is as follows: Pg: Database -> Namespace(Schema) -> Table Gt: Catalog -> Database(Schema) -> Table (we can't create new Catalog currently)
pg_catalog related system function
Implement them with datafusion udf. Like:
https://github.com/GreptimeTeam/greptimedb/blob/578dd8f87a2897fa586e5833fead18cc24d1912a/src/common/function/src/function_registry.rs#L90
I'll manage to provide similar features in #2931.
Tasks
- [ ] pg_database
- [x] pg_namespace
- [x] pg_class
- [x] pg_collation, not supported yet, just like the mysql ver. did
- [x] pg_trigger, we don't support it.
- [x] pg_type
- [ ] pg_partitioned_table
- [ ] pg_statistic
While supporting \dt
, i.e. the following sql statement, I met some problems:
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 't' THEN 'TOAST table' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned index' END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','p','')
AND n.nspname <> 'pg_catalog'
AND n.nspname !~ '^pg_toast'
AND n.nspname <> 'information_schema'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
-
n.oid = c.relnamespace
, this predicate need anamespace.oid: u32
to check equality, which can be seen asschema
'sid
in greptime. However, there is no such thing likeschema_id
in currentSchemaManager
. https://github.com/J0HN50N133/greptimedb/blob/906abd61de5e6da4e2ae5bba3f13fea567509166/src/common/meta/src/key/schema_name.rs#L54 Should we add this field inSchemaManager
? This may break backward compatibility. -
pg_get_userbyid
: it seems that user authentication is not supported in the open source codebase. Should I just return empty string?
@tisonkun WDYT?
@J0HN50N133 Thanks for pushing forward this ticket!
pg_get_userbyid: it seems that user authentication is not supported in the open source codebase. Should I just return empty string?
Yeah. This is reasonable as a first implementation. cc @shuiyisong @zyy17 later we may consider how to integrate it with the plugin system to allow populate user info in this case.
n.oid = c.relnamespace
Adding ID for schema can be a breaking change and introduce overload burden to overcome in this issue.
cc @killme2008 @waynexia please take a look at this information schema related logics. A workaround can be using the schema name as "oid" in this place.
@tisonkun
A workaround can be using the schema name as "oid" in this place.
The shortcomings is the potential incompatibility with existing pg eco.
The shortcomings is the potential incompatibility with existing pg eco.
Yeah. Some how the type is different, but comparing should work.
cc @killme2008 @waynexia what's the affect if we try to add id to each schema?
Next I will try to support dbeaver connection to greptime, via pg protocol. The following sql is the statements fail.
SELECT typcategory FROM pg_catalog.pg_type WHERE 1<>1 LIMIT 1
SELECT t.oid,t.*,c.relkind,NULL as base_type_name, d.description
FROM pg_catalog.pg_type t
LEFT OUTER JOIN pg_catalog.pg_class c ON c.oid=t.typrelid
LEFT OUTER JOIN pg_catalog.pg_description d ON t.oid=d.objoid
WHERE t.typname IS NOT NULL
AND (c.relkind IS NULL OR c.relkind = 'c')
@J0HN50N133 I just found that tables like pg_class
, pg_namespace
are hidden tables that available under all schemas, which means some client can run sql like:
SELECT c.relname FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relkind in ('r', 'p')
without calling pg_catalog.pg_class
.
One idea is to resolve those table like pg_class
, pg_namespace
and etc. under any schema to pg_catalog.*
@J0HN50N133 I just found that tables like
pg_class
,pg_namespace
are hidden tables that available under all schemas, which means some client can run sql like:SELECT c.relname FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relkind in ('r', 'p')
without calling
pg_catalog.pg_class
.One idea is to resolve those table like
pg_class
,pg_namespace
and etc. under any schema topg_catalog.*
@sunng87 Yeah, #4543 maybe also relate to this, for example, this hidden table should not appear in MySQL client. Maybe we modify DfTableSourceProvider to solve both problems.