materialize icon indicating copy to clipboard operation
materialize copied to clipboard

coord: search_path does not behave as in Postgres: wrong default, can not be modified, not used for CREATE

Open philip-stoev opened this issue 3 years ago • 4 comments

What version of Materialize are you using?

materialized v0.7.4-dev (82ae64f89)

What was the issue?

search_path is not handled correctly in Materialize. In particular, the following does not work:

  1. search path can not be changed
materialize=> set search_path to 'abc';
ERROR:  parameter "search_path" cannot be changed
  1. CREATE TABLE is supposed to create the table in the first schema of the search path. However, in Mz, the search path is:
materialize=> SHOW search_path;
               search_path               
-----------------------------------------
 mz_catalog, pg_catalog, public, mz_temp

and yet all the tables are created in the public schema by default.

  1. The default for search_path is different from the Postgres default, where it is:
pstoev=# show search_path;
   search_path   
-----------------
 "$user", public
(1 row)

philip-stoev avatar May 14 '21 09:05 philip-stoev

FYI this is also a blocker for working Prisma.io integration.

Using SET search_path = "public"; leads to an ReadOnlyParameter Error. I see that the search path cannot be changed, but it might make sense to simply ignore the command if it tries to set the search path to public anyway.(https://github.com/MaterializeInc/materialize/blob/main/src/coord/src/session/vars.rs#L342)

via Community Slack

ruf-io avatar Jan 07 '22 14:01 ruf-io

but it might make sense to simply ignore the command if it tries to set the search path to public anyway. This is what we do for NAMES = utf8 as of https://github.com/MaterializeInc/materialize/pull/9900, but I am not sure what we should do when showing search_path

internally, we filter out the non-pub ones for current_schemas, but not in other cases , as far as I can tell https://github.com/MaterializeInc/materialize/blob/main/src/coord/src/catalog.rs#L2728-L2743

guswynn avatar Jan 07 '22 18:01 guswynn

As per https://github.com/MaterializeInc/developer-experience/issues/96 this seems to be a blocker for PostgreSQL FDW.

bobbyiliev avatar Apr 06 '22 20:04 bobbyiliev

#12222 fixed most of this. Looks like the remaining work item is to make CREATE statements create their objects in the first schema on the search path, rather than in public.

benesch avatar Aug 10 '22 05:08 benesch

It turns out there's a fair amount of work remaining here:

  • The SQL parser needs to be extended to support taking multiple values for a SET statement.
  • The variable layer needs to handle multi-valued variables. The rules are weird. E.g., SET DateStyle = 'ISO, MDY' is equivalent to SET DateStyle = ISO, MDY, but SET search_path = a, b is not equivalent to SET search_path = 'a, b'.
  • The current_schema and current_schemas functions need to be adjusted to show the resolved search path, rather than the unresolved value of the search_path variable
  • DDL statements need to be adjusted to create objects in the first schema on the search path, as mentioned above.

benesch avatar Oct 09 '22 14:10 benesch

Here's a very stale branch that started work on teaching the SQL parser and variable layers to handle multi-valued variables: https://github.com/MaterializeInc/materialize/compare/main...benesch:materialize:vars-and-search-path?expand=1

benesch avatar Feb 06 '23 06:02 benesch

A tweak about DDL schemas: they create the object on the first schema that exists in search_path. "$user", public is the default in pg, but if the $user schema doesn't exist, it'll get made on public.

maddyblue avatar Feb 06 '23 20:02 maddyblue

Other problems we have current_schemas in postgres returns an array of schemas that exist from the set specified in search_path, whereas we don't consult the catalog to see what schemas exist. This same logic happens during DDL schema resolution. Evidence is mounting that current_schemas should be a variable set in the StatementContext which the planner has access to.

We already added support for multi value variables in an earlier PR, so the above branch is roughly too out of date to both with the merge conflicts.

mjibson=# select current_schemas(true);
       current_schemas       
-----------------------------
 {pg_catalog,mjibson,public}
(1 row)

Time: 0.331 ms
mjibson=# select current_schemas(false);
 current_schemas  
------------------
 {mjibson,public}
(1 row)

Time: 0.200 ms
mjibson=# drop schema mjibson;
DROP SCHEMA
Time: 1.686 ms
mjibson=# select current_schemas(false);
 current_schemas 
-----------------
 {public}
(1 row)

Time: 0.334 ms
mjibson=# show search_path;
   search_path   
-----------------
 "$user", public
(1 row)

Time: 0.213 ms
mjibson=# drop schema public;
DROP SCHEMA
Time: 3.092 ms
mjibson=# show search_path;
   search_path   
-----------------
 "$user", public
(1 row)

Time: 0.233 ms
mjibson=# select current_schemas(false);
 current_schemas 
-----------------
 {}
(1 row)

maddyblue avatar Feb 06 '23 21:02 maddyblue

We already added support for multi value variables in an earlier PR, so the above branch is roughly too out of date to both with the merge conflicts.

I remember looking at that PR when it went past, and thinking that it was not PostgreSQL compatible. The PostgreSQL rules for multi-valued variables are very weird.

On Mon, Feb 6, 2023 at 4:02 PM Matt Jibson @.***> wrote:

Other problems we have current_schemas in postgres returns an array of schemas that exist from the set specified in search_path, whereas we don't consult the catalog to see what schemas exist. This same logic happens during DDL schema resolution. Evidence is mounting that current_schemas should be a variable set in the StatementContext which the planner has access to.

We already added support for multi value variables in an earlier PR, so the above branch is roughly too out of date to both with the merge conflicts.

mjibson=# select current_schemas(true); current_schemas

{pg_catalog,mjibson,public} (1 row)

Time: 0.331 ms mjibson=# select current_schemas(false); current_schemas

{mjibson,public} (1 row)

Time: 0.200 ms mjibson=# drop schema mjibson; DROP SCHEMA Time: 1.686 ms mjibson=# select current_schemas(false); current_schemas

{public} (1 row)

Time: 0.334 ms mjibson=# show search_path; search_path

"$user", public (1 row)

Time: 0.213 ms mjibson=# drop schema public; DROP SCHEMA Time: 3.092 ms mjibson=# show search_path; search_path

"$user", public (1 row)

Time: 0.233 ms mjibson=# select current_schemas(false); current_schemas

{} (1 row)

— Reply to this email directly, view it on GitHub https://github.com/MaterializeInc/materialize/issues/6754#issuecomment-1419748851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXSICRDJIHYM3UZ6VQSSDWWFRGHANCNFSM444EME6A . You are receiving this because you commented.Message ID: @.***>

benesch avatar Feb 06 '23 22:02 benesch

Agree. The use of both Literals and Identifiers now seems suspicious, and we probably need to refactor to support both. Working this out.

maddyblue avatar Feb 06 '23 23:02 maddyblue

Happy to try to find some time to chat this out if helpful. I spent a fair bit of time reading the PostgreSQL source for this, and can probably page those memories back in without too much effort. At the very least, game to do the code review.

On Mon, Feb 6, 2023 at 6:12 PM Matt Jibson @.***> wrote:

Agree. The use of both Literals and Identifiers now seems suspicious, and we probably need to refactor to support both. Working this out.

— Reply to this email directly, view it on GitHub https://github.com/MaterializeInc/materialize/issues/6754#issuecomment-1419918965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXSIE2F7ZCIFY65X4KNHTWWGAM5ANCNFSM444EME6A . You are receiving this because you commented.Message ID: @.***>

benesch avatar Feb 06 '23 23:02 benesch