postgrest Schema-based, incremental versioning

I was recently pointed at https://begriffs.com/posts/2014-12-30-intro-to-postgrest.html, where Joe presented the core ideas that PostgREST was built on initially. From 8:30 on, he talks about schema-based versioning with fall-through. I think this is a great idea. I guess we have since moved on from the Accept: ...; version= parameter - and that's fine.

However, I'd like to pick up on the fall-through idea again and I think we can improve the schema-based versioning experience we currently have by far.

Right now, we support multiple schemas through Accept-Profile and Content-Profile headers. However, the strict separation between schemas in this case, makes the profile headers more useful for multi-tenancy setups and not for (incremental) versioning.

One other thing to consider is, that there are multiple different ways in how you can introduce breaking changes to your API - for which you then need api versioning: The most obvious, of course, is when you change the exposed schema in a breaking way. However, breaking changes can also be introduced by upgrading PostgREST itself to a new major version - or by just changing one of PostgREST's config options. To stay backwards compatible in those cases it might make sense to run multiple instances of PostgREST (older and newer versions) at the same time. In this case a versioning approach based on schema and authentication role is useful:

-- main authenticator role -grant all api roles to this role
-- this is not used for login directly anymore
CREATE ROLE authenticator;

-- this role is used for the PostgREST instance that serves api version 1
CREATE ROLE v1 IN ROLE authenticator LOGIN;
ALTER ROLE v1 SET pgrst.db_schemas TO 'v1';
ALTER ROLE v1 SET pgrst.raw_media_types TO 'text/plain';

CREATE SCHEMA v1;
-- ...

-- now we introduce some breaking changes in v2:
CREATE ROLE v2 IN ROLE authenticator LOGIN;
ALTER ROLE v2 SET pgrst.db_schemas TO 'v2';
-- this is a breaking change
ALTER ROLE v2 SET pgrst.raw_media_types TO '';

CREATE SCHEMA v2;
-- ...

We can run two PostgREST instances at the same time now, one of each will login with v1 or v2 respectively. I guess the most used setup would then be to map URI prefixes of /api/v1/... and /api/v2/... to those instances via nginx for example.

This should currently work just fine. I guess we could extend the documentation about this approach somewhere.

One thing that's annoying, though, is that the full api needs to be re-created in the v2 schema to be able to expose all paths.

If we could use incremental versioning instead, we could just create those objects that changed in the new schema and still rely on the old v1 objects for those that didn't change, so something like this:

CREATE VIEW v1.clients AS ...;
CREATE VIEW v1.projects AS ...;

-- we need to update `projects` in a breaking way, so we do:
CREATE VIEW v2.projects AS ...;

-- now, if we set the search_path properly, we can do:
SET search_path TO v2, v1;

SELECT * FROM clients; -- uses v1.clients
SELECT * FROM projects; -- uses v2.projects

This would be incremental versioning.

How can we achieve something like that?

We were discussing several ways of storing our own metadata about database objects inside the database (via comment, via some table + utility functions, ...). If we were to store something like extends: v1 for the v2 schema, we could still use the same syntax we currently use in db-schemas = v2 - just when querying for the schema cache, we could extend our search for exposable objects to that extended schema - recursively.

Feb 14 '22 21:02 wolfgangwalther

Right now, we support multiple schemas through Accept-Profile and Content-Profile headers. However, the strict separation between schemas in this case, makes the profile headers more useful for multi-tenancy setups and not for (incremental) versioning.

Very true. We might as well map different Accept-Profile to different databases and do database-based multitenancy.

One thing that's annoying, though, is that the full api needs to be re-created in the v2 schema to be able to expose all paths. SET search_path TO v2, v1;

Yeah, duplicating db objects it's really bad. I also explored the search_path idea on https://github.com/PostgREST/postgrest/issues/1106 but IIRC we couldn't just leave the schema resolving to pg because the search_path also implicitly includes pg_catalog, which meant that the API would expose things like pg_roles.

How can we achieve something like that? We were discussing several ways of storing our own metadata about database objects inside the database (via comment, via some table + utility functions, ...).

I remember that(on https://github.com/PostgREST/postgrest/issues/2028#issuecomment-973832177), so it would be like:

COMMENT ON VIEW v2.tasks $$
---
pgrst:
  extends: v1
  date: 2022-02-24  
$$;

With dates perhaps we can also do something similar to Stripe versioning.

Feb 25 '22 04:02 steve-chavez

Very true. We might as well map different Accept-Profile to different databases and do database-based multitenancy.

The Profile headers are still annoying to use - because of the different headers for GET and POST etc..

I also think that, since the headers describe the format of the body, they are not really suited for multi-tenancy. Multi-tenancy is about selecting different resources - not the same resources in a different body format.

Looking at it from a http perspective, I think the thing that maps to different databases the best is the hostname in the request. We are not using that at all, yet.

Schemas are best mapped to a path prefix, imho.

Yeah, duplicating db objects it's really bad. I also explored the search_path idea on #1106 but IIRC we couldn't just leave the schema resolving to pg because the search_path also implicitly includes pg_catalog, which meant that the API would expose things like pg_roles.

Hm. Maybe we can use the search_path only at schema cache creation time?

I remember a while back we discussed somewhere else to only expose all kinds of routes explicitely, not only RPCs, but also tables/views. I.e. if it's not in the schema cache, it will throw an error. If we did that, we could use the search path strategy to find exposed endpoints for the schema cache. So the schema cache for v2 would contain everything from v2 and v1.

I remember that(on #2028 (comment)), so it would be like:
COMMENT ON VIEW v2.tasks $$
---
pgrst:
  extends: v1
  date: 2022-02-24  
$$;

Uh, I wanted to put the metadata/comment on the SCHEMA, not each individual object. Adding a comment for all database objects would be quite cumbersome, I think.

With dates perhaps we can also do something similar to Stripe versioning.

In this case you would need a new schema for each change, to have a new date associated with it. If you do that, you don't need the v1 as the schema name anymore, but you should instead do something like this:

-- (using a shorter syntax for comments for clarity)
-- ...
CREATE SCHEMA "2022-02-24";
COMMENT ON SCHEMA "2022-02-24" $$extends: 2022-02-23$$;
CREATE SCHEMA "2022-02-25";
COMMENT ON SCHEMA "2022-02-25" $$extends: 2022-02-24$$;
ALTER ROLE v2 SET pgrst.db_schemas TO '2022-02-25, 2022-02-24, 2022-02-23';
-- ...

So basically every change is a new schema, but only for (major) breaking changes you'd create a new role, which maps to a new path prefix via reverse proxy.

And then, with what we have right now, you could use the profile headers to select the sub-version / schema to use for that request.

Feb 25 '22 07:02 wolfgangwalther

Schemas are best mapped to a path prefix, imho.

I've realized that path based schemas do not conform to REST because schemas are an implementation detail(namespaces) that can change, if that happens it would break the URI.

Oct 13 '22 19:10 steve-chavez

I've realized that path based schemas do not conform to REST because schemas are an implementation detail(namespaces) that can change, if that happens it would break the URI.

It's a matter of definition what constitutes an implementation detail and what doesn't. Once you make schemas part of the URI, they are not an implementation detail anymore of course. But that's not a problem. Schema names can change - but so can table names or column names. And those are no implementation details, either.

You can turn it around and look at it from the http perspective: Assume you'd like to create two separate endpoints with the same name. That's not possible. You need a namespace for that. So you're going to use.... a path prefix as a namespace. This is done everywhere.

So a path prefix is nothing else than a namespace, too. Schemas are a great match.

Oct 14 '22 11:10 wolfgangwalther