postgrest [Feature Request] HTTP headers filtering - whitelist transaction-scoped settings

It's good to introduce an option for filtering HTTP headers passed to PostgreSQL.

For example, request-headers-allowlist, with default value is * (meaning all headers are passed).

There're many unnecessary set_config() commands in every transaction.

I've gotten 22 params at all, where a 50% of them are HTTP headers, a lot of people doesn't use (or uses small portion) it. We use only two of 11.

Sep 09 '21 16:09 dwagin

There're many unnecessary set_config() commands in every transaction.

Related to https://github.com/PostgREST/postgrest/issues/1857. We'll need to set all the headers in a single set_config for postgres 14.

When we do that, perhaps passing all the headers won't be a problem?

For example, request-headers-allowlist, with default value is * (meaning all headers are passed).

I see no harm in offering such a config option though, it would reduce the passed json payload for #1857 as well.

Sep 09 '21 17:09 steve-chavez

A different approach, compared to another config option, that achieves the same thing, but is much more flexible down the road is mentioned in https://github.com/PostgREST/postgrest/pull/1710#issuecomment-753306915.

Sep 13 '21 07:09 wolfgangwalther

I see no harm in offering such a config option though, it would reduce the passed json payload for #1857 as well.

@steve-chavez, of course, proposed enhancement would reduce JSON-payload, that decrease request overhead time.

A different approach, compared to another config option, that achieves the same thing, but is much more flexible down the road is mentioned in #1710 (comment).

@wolfgangwalther, your enhancement doesn't filter headers on PostgREST level, only on PostgreSQL level. Therefore, these features can complement each other, not replace.

Sep 16 '21 16:09 dwagin

@wolfgangwalther, your enhancement doesn't filter headers on PostgREST level, only on PostgreSQL level.

That's not correct. Implementing the full proposal would allow filtering the headers on the PostgREST level already. It would be set up via arguments to a pre-request function - but those headers that are not used, will never be sent to PostgreSQL.

Sep 16 '21 17:09 wolfgangwalther

@wolfgangwalther So I don't understand how it works, please tell me in more detail.

Sep 16 '21 17:09 dwagin

@wolfgangwalther So I don't understand how it works, please tell me in more detail.

The proposal starts by stripping all set_config(...)s - so you'd start from zero. No headers transmitted to PostgreSQL at all.

At schema cache creation time, we'd read the argument list of the pre-request function definition. If that reads something like CREATE FUNCTION your_pre_req ("headers.your_first_header" TEXT, "headers.your_second_header" TEXT) ... only those two headers would be passed as arguments to the pre-request function. If you needed the headers in any other place than the pre-request function body, you could then call set_config(...) yourself in the pre-request function to make it available.

Thereby the filter will apply at schema cache creation time - and therefore at the PostgREST level, before getting to PostgreSQL at all during the actual query.

Does that make sense?

Sep 20 '21 12:09 wolfgangwalther

@wolfgangwalther The pre-request approach would impact performance though(one extra roundtrip to the db), we discussed that somewhere. Unless we solve that part(by joining the pre-request call to the initial set locals), the request-headers-allowlist config is looking much simpler.

Sep 20 '21 22:09 steve-chavez

@wolfgangwalther now everything is clear. IMHO main issue of this solution is function parameters, it is not very flexible and difficulty updating under load.

I would prefer something like CREATE FUNCTION your_pre_req (headers json) ... + request-headers-allowlist. This solves the PG14 compatibility issue and provides flexibility.

@steve-chavez of course pre-request must be join to initial statement, one nuance as far as I know the order of the call is not defined in SELECT func1(), func2(), ..., so the pre-request must be defined with schema.

Sep 21 '21 12:09 dwagin

Unless we solve that part(by joining the pre-request call to the initial set locals)

Yes, I assumed that would be part of implementing that. But I can see how we'll face the same problem with SET ROLE here, that we would face in the main query.

So maybe we should join the pre-request function to the main query instead - and keep the SET ROLE (without all the other set_configs) as the first step. After all, we're quite certain that we can run the pre-request function before everything else - just not 100% as in "we can guarantee it will never happen". That's why we can't use it for SET ROLE, as this would impact security potentially.

@wolfgangwalther now everything is clear. IMHO main issue of this solution is function parameters, it is not very flexible and difficulty updating under load.

I would prefer something like CREATE FUNCTION your_pre_req (headers json) ...

Well, the proposal includes the ability to create a headers json argument to get all headers.

I don't really see the difficulty of updating under load, yet, but I haven't thought about that too much either. Could you explain where you see the problem?

Sep 21 '21 12:09 wolfgangwalther

Well, the proposal includes the ability to create a headers json argument to get all headers.

Yes, I understand that needs to be added request-headers-allowlist.

I don't really see the difficulty of updating under load, yet, but I haven't thought about that too much either. Could you explain where you see the problem?

For example:

We have two PostgREST for HA
We want to add additional header handling

Our actions:

Add new function with additional parameter (we must continue to serve requests)
Reload the schema cache of the first PostgREST (which function will PostgREST choose?, functions should be with different names?)
Reload the schema cache of the second PostgREST
Drop old function

Our actions (single JSON):

Add new header to request-headers-allowlist
Reload all PostgREST instances (order is not important)
Replace pre-request function

I think the first case is more difficult than the second. Especially when DevOps and DBA are different people.

Sep 21 '21 13:09 dwagin

Thanks for the walk-through.

2\. Reload the schema cache of the first PostgREST (which function will PostgREST choose?, functions should be with different names?)

I think this is the core of the issue. Once we create pre-request functions with different arguments, we can overload the same function. This will then quickly lead to a situation, where PostgREST needs to decide which of those to call later, while reloading the schema cache.

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar. If multiple functions are registered for the same hook, then all of them would be called.

This would allow to:

CREATE FUNCTION the new pre-request with SET pgrst.hook, CREATE OR REPLACE FUNCTION the old pre-request function without SET pgrst.hook - both in one transaction.
Then reload all PostgREST instances
Then drop the old pre-request function, which is not used anymore.

This would also make the system extensible to support hooks in other places than "pre-request" in the same way.

One side effect of this, that I like: We'd move more of the config right into SQL, where objects (here FUNCTION) are defined.

Sep 22 '21 12:09 wolfgangwalther

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar.

I don't understand how it works, please tell me in more detail.

Sep 22 '21 13:09 dwagin

One way to handle this would be to move away from the config option db-pre-request (or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request' or something similar.

I don't understand how it works, please tell me in more detail.

The concept of SET on a function definition is something that we tried out in #1582. We didn't merge that, yet, because of some limitations for inlining those functions, but those limits would not apply in this case here.

The basic idea is that we can set all kinds of "flags" or "parameters" via SET on a function definition. We can then parse those at schema cache creation time and decide how to use those functions. This could replace the db-pre-request option. Every function defined with a SET pgrst.hook='pre-request' would then be a pre-request function.

Sep 22 '21 13:09 wolfgangwalther

So maybe we should join the pre-request function to the main query instead

I tried the above and the query can work like this:

WITH
pgrst_pre_request AS ( SELECT test."pre_request"() ),
pgrst_source AS ( SELECT "test"."clients".*FROM "test"."clients"     )
SELECT
 (select 1 from pgrst_pre_request) AS pre_req, -- it must be before the response_headers below otherwise headers set on pre-req don't get passed to PostgREST
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  nullif(current_setting('response.headers', true), '') AS response_headers,
  nullif(current_setting('response.status', true), '') AS response_status
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

Jul 28 '22 05:07 steve-chavez

So maybe we should join the pre-request function to the main query instead

I tried the above and the query can work like this:

Don't we have the same problem with pre-request on the main query as with SET ROLE on the main query? Pre request functions are used to implement auth strategies, including a SET ROLE in them (I do!) - and that could possibly break.

I was now finally able to produce an example where joining SET ROLE to the main query would cause a problem:

create role authenticator;
create role alice;
create schema a;
grant create, usage on schema a to alice;
set role alice;
create table a.t ();
select * from a.t;
reset role;

select set_config('role', 'alice', false), current_setting('role');
-- returns: alice | alice

-- BUT:

set role authenticator;
select set_config('role', 'alice', false), current_setting('role') from a.t;
-- returns: permission denied for schema a

reset role;
grant usage on schema a to authenticator;
set role authenticator;
select set_config('role', 'alice', false), current_setting('role') from a.t;
-- returns: permission denied for table t

Since the authenticator role should have no privileges itself at all and since privileges are checked at planning time, this will always fail.

Jul 28 '22 06:07 wolfgangwalther

Pre request functions are used to implement auth strategies, including a SET ROLE in them (I do!) - and that could possibly break.

Good call, my proposed query is a no go then. Then the only way to improve perf for pre-request would be pipeline mode.

Jul 28 '22 14:07 steve-chavez

If we had pipeline mode, then https://github.com/PostgREST/postgrest/issues/1941#issuecomment-922883153 would suddenly be a lot more attractive, I think.

Aug 11 '22 18:08 wolfgangwalther

Now that we have transaction-scoped settings clearly defined, I think we can have a single config for all. Should be like:

# default
tx-settings = '{"*": {"*": ["*"]}}'

# whitelisting some headers and allowing all jwt claims
tx-settings = '{"public.table": {"request.headers": ["header1", "header2"]}, "public.func()": {"request.jwt.claims": ["*"]}}'

And with pre-config:

create or replace function postgrest.pre_config()
returns void as $$
select
  set_config('pgrst.tx_settings',
  , json_build_object('public.table'
    , json_build_object('request.headers', json_build_array('header1', 'header2'))
    )::text
  , true
  )
;
$$ language sql;

Oct 21 '23 20:10 steve-chavez

tx-settings = '{"public.table": {"request.headers": ["header1", "header2"]}, "public.func()": {"request.jwt.claims": ["*"]}}'

@steve-chavez Could you give more detail about what public.table and public.func() are in this?

Jan 15 '24 17:01 taimoorzaeem

Could you give more detail about what public.table and public.func() are in this?

@taimoorzaeem Those were meant to be a table and a function.. but I think we should take a step back. There was a recent change that affects this issue.

At schema cache creation time, we'd read the argument list of the pre-request function definition. If that reads something like CREATE FUNCTION your_pre_req ("headers.your_first_header" TEXT, "headers.your_second_header" TEXT)

@wolfgangwalther How would the above work now that we only have the JSON GUCs (https://github.com/PostgREST/postgrest/pull/3032/commits/60a90003561d39e582a02551a94f7b305f98ee53)?

So if we had:

CREATE FUNCTION your_pre_req ("request.headers" json)

This would mean all the headers would be always passed to the pre-request function, which would still have a perf impact? How could we filter the headers that get passed as an argument?

(the pre-request way looks ideal since it's the most flexible one but not sure if we can make it work)

Jan 16 '24 05:01 steve-chavez

How would the https://github.com/PostgREST/postgrest/issues/1941#issuecomment-922883153 work now that we only have the JSON GUCs (https://github.com/PostgREST/postgrest/commit/60a90003561d39e582a02551a94f7b305f98ee53)?

Still the same. The "new" JSON GUC format was only invented because of a limitation in what characters were allowed for GUC names. There is no such limitation for argument names... except maybe NAMEDATALEN. So it could be useful to think about shortening the required prefixes. I guess we don't need both request and headers, so header.xxx should be enough. Other stuff like request.method could still be prefixed with with request. jwt stuff could be without jwt prefix, too: claims.yyy, or maybe jwt.yyy (is there anything else than claims that we'd need from a jwt?).

Jan 16 '24 07:01 wolfgangwalther

Hm, so the proposal remains in that we should filter the passed headers according to the pre-request parameters. But that caused some problems with deployment as mentioned above. It would be complex to define a function with 20 arguments if the user is interested in 20 headers.

Really this looks to me like a pure proxy issue. So if we had an nginx like directive, we could do it like:

server{
  location{
    pg_pre_request 'pre_req_func' '[$http_user_agent, $http_referer]' '..'
  }
}

We don't have that kind of flexibility, but maybe we can come up with a similar solution.

Jan 16 '24 15:01 steve-chavez

But that caused some problems with deployment as mentioned https://github.com/PostgREST/postgrest/issues/1941#issuecomment-923966854.

This is not a problem with the proposal here - this is a general problem of "how to do HA with PostgREST". The core problem was identified in https://github.com/PostgREST/postgrest/issues/1941#issuecomment-924903130. You can easily work around that problem by versioning your pre-request function names to avoid any overloaded collisions:

Create new function pre_request_v2
reload all PostgREST instances with the new pre-request setting
Drop old function pre_request_v1

Yes, we do need to figure out what to do with multiple overloaded pre-request functions which all match our config. But I already made some proposals above.

It would be complex to define a function with 20 arguments if the user is interested in 20 headers.

But that's the opposite of what was the idea of this issue. If you need 20 headers, you are more than likely fine with just adding a catch-all headers argument. We can easily support both: headers for all headers and header.accept etc. for individual headers.

Jan 16 '24 16:01 wolfgangwalther

postgrest postgrest copied to clipboard

[Feature Request] HTTP headers filtering - whitelist transaction-scoped settings

postgrest
postgrest copied to clipboard