postgrest
postgrest copied to clipboard
[Feature Request] HTTP headers filtering - whitelist transaction-scoped settings
It's good to introduce an option for filtering HTTP headers passed to PostgreSQL.
For example, request-headers-allowlist
, with default value is *
(meaning all headers are passed).
There're many unnecessary set_config()
commands in every transaction.
I've gotten 22 params at all, where a 50% of them are HTTP headers, a lot of people doesn't use (or uses small portion) it. We use only two of 11.
There're many unnecessary set_config() commands in every transaction.
Related to https://github.com/PostgREST/postgrest/issues/1857. We'll need to set all the headers in a single set_config
for postgres 14.
When we do that, perhaps passing all the headers won't be a problem?
For example, request-headers-allowlist, with default value is * (meaning all headers are passed).
I see no harm in offering such a config option though, it would reduce the passed json payload for #1857 as well.
A different approach, compared to another config option, that achieves the same thing, but is much more flexible down the road is mentioned in https://github.com/PostgREST/postgrest/pull/1710#issuecomment-753306915.
I see no harm in offering such a config option though, it would reduce the passed json payload for #1857 as well.
@steve-chavez, of course, proposed enhancement would reduce JSON-payload, that decrease request overhead time.
A different approach, compared to another config option, that achieves the same thing, but is much more flexible down the road is mentioned in #1710 (comment).
@wolfgangwalther, your enhancement doesn't filter headers on PostgREST level, only on PostgreSQL level. Therefore, these features can complement each other, not replace.
@wolfgangwalther, your enhancement doesn't filter headers on PostgREST level, only on PostgreSQL level.
That's not correct. Implementing the full proposal would allow filtering the headers on the PostgREST level already. It would be set up via arguments to a pre-request function - but those headers that are not used, will never be sent to PostgreSQL.
@wolfgangwalther So I don't understand how it works, please tell me in more detail.
@wolfgangwalther So I don't understand how it works, please tell me in more detail.
The proposal starts by stripping all set_config(...)
s - so you'd start from zero. No headers transmitted to PostgreSQL at all.
At schema cache creation time, we'd read the argument list of the pre-request function definition. If that reads something like CREATE FUNCTION your_pre_req ("headers.your_first_header" TEXT, "headers.your_second_header" TEXT) ...
only those two headers would be passed as arguments to the pre-request function. If you needed the headers in any other place than the pre-request function body, you could then call set_config(...)
yourself in the pre-request function to make it available.
Thereby the filter will apply at schema cache creation time - and therefore at the PostgREST level, before getting to PostgreSQL at all during the actual query.
Does that make sense?
@wolfgangwalther The pre-request
approach would impact performance though(one extra roundtrip to the db), we discussed that somewhere. Unless we solve that part(by joining the pre-request
call to the initial set local
s), the request-headers-allowlist
config is looking much simpler.
@wolfgangwalther now everything is clear. IMHO main issue of this solution is function parameters, it is not very flexible and difficulty updating under load.
I would prefer something like CREATE FUNCTION your_pre_req (headers json) ...
+ request-headers-allowlist
. This solves the PG14 compatibility issue and provides flexibility.
@steve-chavez of course pre-request
must be join to initial statement, one nuance as far as I know the order of the call is not defined in SELECT func1(), func2(), ...
, so the pre-request
must be defined with schema.
Unless we solve that part(by joining the
pre-request
call to the initialset local
s)
Yes, I assumed that would be part of implementing that. But I can see how we'll face the same problem with SET ROLE
here, that we would face in the main query.
So maybe we should join the pre-request
function to the main query instead - and keep the SET ROLE
(without all the other set_config
s) as the first step. After all, we're quite certain that we can run the pre-request function before everything else - just not 100% as in "we can guarantee it will never happen". That's why we can't use it for SET ROLE
, as this would impact security potentially.
@wolfgangwalther now everything is clear. IMHO main issue of this solution is function parameters, it is not very flexible and difficulty updating under load.
I would prefer something like CREATE FUNCTION your_pre_req (headers json) ...
Well, the proposal includes the ability to create a headers json
argument to get all headers.
I don't really see the difficulty of updating under load, yet, but I haven't thought about that too much either. Could you explain where you see the problem?
Well, the proposal includes the ability to create a
headers json
argument to get all headers.
Yes, I understand that needs to be added request-headers-allowlist
.
I don't really see the difficulty of updating under load, yet, but I haven't thought about that too much either. Could you explain where you see the problem?
For example:
- We have two PostgREST for HA
- We want to add additional header handling
Our actions:
- Add new function with additional parameter (we must continue to serve requests)
- Reload the schema cache of the first PostgREST (which function will PostgREST choose?, functions should be with different names?)
- Reload the schema cache of the second PostgREST
- Drop old function
Our actions (single JSON):
- Add new header to
request-headers-allowlist
- Reload all PostgREST instances (order is not important)
- Replace
pre-request
function
I think the first case is more difficult than the second. Especially when DevOps and DBA are different people.
Thanks for the walk-through.
2\. Reload the schema cache of the first PostgREST (which function will PostgREST choose?, functions should be with different names?)
I think this is the core of the issue. Once we create pre-request functions with different arguments, we can overload the same function. This will then quickly lead to a situation, where PostgREST needs to decide which of those to call later, while reloading the schema cache.
One way to handle this would be to move away from the config option db-pre-request
(or whatever it's name is currently) and make it a CREATE FUNCTION ... SET pgrst.hook='pre-request'
or something similar. If multiple functions are registered for the same hook, then all of them would be called.
This would allow to:
-
CREATE FUNCTION
the new pre-request withSET pgrst.hook
,CREATE OR REPLACE FUNCTION
the old pre-request function withoutSET pgrst.hook
- both in one transaction. - Then reload all PostgREST instances
- Then drop the old pre-request function, which is not used anymore.
This would also make the system extensible to support hooks in other places than "pre-request" in the same way.
One side effect of this, that I like: We'd move more of the config right into SQL, where objects (here FUNCTION
) are defined.
One way to handle this would be to move away from the config option
db-pre-request
(or whatever it's name is currently) and make it aCREATE FUNCTION ... SET pgrst.hook='pre-request'
or something similar.
I don't understand how it works, please tell me in more detail.
One way to handle this would be to move away from the config option
db-pre-request
(or whatever it's name is currently) and make it aCREATE FUNCTION ... SET pgrst.hook='pre-request'
or something similar.I don't understand how it works, please tell me in more detail.
The concept of SET
on a function definition is something that we tried out in #1582. We didn't merge that, yet, because of some limitations for inlining those functions, but those limits would not apply in this case here.
The basic idea is that we can set all kinds of "flags" or "parameters" via SET
on a function definition. We can then parse those at schema cache creation time and decide how to use those functions. This could replace the db-pre-request
option. Every function defined with a SET pgrst.hook='pre-request'
would then be a pre-request function.
So maybe we should join the pre-request function to the main query instead
I tried the above and the query can work like this:
WITH
pgrst_pre_request AS ( SELECT test."pre_request"() ),
pgrst_source AS ( SELECT "test"."clients".*FROM "test"."clients" )
SELECT
(select 1 from pgrst_pre_request) AS pre_req, -- it must be before the response_headers below otherwise headers set on pre-req don't get passed to PostgREST
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
nullif(current_setting('response.headers', true), '') AS response_headers,
nullif(current_setting('response.status', true), '') AS response_status
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;
So maybe we should join the pre-request function to the main query instead
I tried the above and the query can work like this:
Don't we have the same problem with pre-request
on the main query as with SET ROLE
on the main query? Pre request functions are used to implement auth strategies, including a SET ROLE
in them (I do!) - and that could possibly break.
I was now finally able to produce an example where joining SET ROLE
to the main query would cause a problem:
create role authenticator;
create role alice;
create schema a;
grant create, usage on schema a to alice;
set role alice;
create table a.t ();
select * from a.t;
reset role;
select set_config('role', 'alice', false), current_setting('role');
-- returns: alice | alice
-- BUT:
set role authenticator;
select set_config('role', 'alice', false), current_setting('role') from a.t;
-- returns: permission denied for schema a
reset role;
grant usage on schema a to authenticator;
set role authenticator;
select set_config('role', 'alice', false), current_setting('role') from a.t;
-- returns: permission denied for table t
Since the authenticator
role should have no privileges itself at all and since privileges are checked at planning time, this will always fail.
Pre request functions are used to implement auth strategies, including a SET ROLE in them (I do!) - and that could possibly break.
Good call, my proposed query is a no go then. Then the only way to improve perf for pre-request would be pipeline mode.
If we had pipeline mode, then https://github.com/PostgREST/postgrest/issues/1941#issuecomment-922883153 would suddenly be a lot more attractive, I think.
Now that we have transaction-scoped settings clearly defined, I think we can have a single config for all. Should be like:
# default
tx-settings = '{"*": {"*": ["*"]}}'
# whitelisting some headers and allowing all jwt claims
tx-settings = '{"public.table": {"request.headers": ["header1", "header2"]}, "public.func()": {"request.jwt.claims": ["*"]}}'
And with pre-config:
create or replace function postgrest.pre_config()
returns void as $$
select
set_config('pgrst.tx_settings',
, json_build_object('public.table'
, json_build_object('request.headers', json_build_array('header1', 'header2'))
)::text
, true
)
;
$$ language sql;
tx-settings = '{"public.table": {"request.headers": ["header1", "header2"]}, "public.func()": {"request.jwt.claims": ["*"]}}'
@steve-chavez Could you give more detail about what public.table
and public.func()
are in this?
Could you give more detail about what public.table and public.func() are in this?
@taimoorzaeem Those were meant to be a table and a function.. but I think we should take a step back. There was a recent change that affects this issue.
At schema cache creation time, we'd read the argument list of the pre-request function definition. If that reads something like CREATE FUNCTION your_pre_req ("headers.your_first_header" TEXT, "headers.your_second_header" TEXT)
@wolfgangwalther How would the above work now that we only have the JSON GUCs (https://github.com/PostgREST/postgrest/pull/3032/commits/60a90003561d39e582a02551a94f7b305f98ee53)?
So if we had:
CREATE FUNCTION your_pre_req ("request.headers" json)
This would mean all the headers would be always passed to the pre-request function, which would still have a perf impact? How could we filter the headers that get passed as an argument?
(the pre-request
way looks ideal since it's the most flexible one but not sure if we can make it work)
How would the https://github.com/PostgREST/postgrest/issues/1941#issuecomment-922883153 work now that we only have the JSON GUCs (https://github.com/PostgREST/postgrest/commit/60a90003561d39e582a02551a94f7b305f98ee53)?
Still the same. The "new" JSON GUC format was only invented because of a limitation in what characters were allowed for GUC names. There is no such limitation for argument names... except maybe NAMEDATALEN
. So it could be useful to think about shortening the required prefixes. I guess we don't need both request and headers, so header.xxx
should be enough. Other stuff like request.method
could still be prefixed with with request. jwt stuff could be without jwt prefix, too: claims.yyy
, or maybe jwt.yyy
(is there anything else than claims that we'd need from a jwt?).
Hm, so the proposal remains in that we should filter the passed headers according to the pre-request parameters. But that caused some problems with deployment as mentioned above. It would be complex to define a function with 20 arguments if the user is interested in 20 headers.
Really this looks to me like a pure proxy issue. So if we had an nginx like directive, we could do it like:
server{
location{
pg_pre_request 'pre_req_func' '[$http_user_agent, $http_referer]' '..'
}
}
We don't have that kind of flexibility, but maybe we can come up with a similar solution.
But that caused some problems with deployment as mentioned https://github.com/PostgREST/postgrest/issues/1941#issuecomment-923966854.
This is not a problem with the proposal here - this is a general problem of "how to do HA with PostgREST". The core problem was identified in https://github.com/PostgREST/postgrest/issues/1941#issuecomment-924903130. You can easily work around that problem by versioning your pre-request function names to avoid any overloaded collisions:
- Create new function
pre_request_v2
- reload all PostgREST instances with the new pre-request setting
- Drop old function
pre_request_v1
Yes, we do need to figure out what to do with multiple overloaded pre-request functions which all match our config. But I already made some proposals above.
It would be complex to define a function with 20 arguments if the user is interested in 20 headers.
But that's the opposite of what was the idea of this issue. If you need 20 headers, you are more than likely fine with just adding a catch-all headers
argument. We can easily support both: headers
for all headers and header.accept
etc. for individual headers.