postgrest
postgrest copied to clipboard
Add `pgrst.accept` setting to RPC
Allows setting a custom media type for the response format(Accept and Content-Type headers) by using the pgrst.accept setting on RPC.
Basically, it serves as an escape hatch for when the user wants an application/msgpack or an image/png response. Example:
-- with this function
create or replace function ret_image(name text) returns bytea as $$
select img from images where name = $1;
$$ language sql
set pgrst.accept = 'image/png';
-- you can obtain an `image/png` by doing:
curl -H "Accept: image/png" localhost:3000/rpc/ret_image?name=A
curl -H "Accept: */*" localhost:3000/rpc/ret_image?name=A
curl localhost:3000/rpc/ret_image?name=A
Currently it doesn't handle a wildcard media subtype(like image/*) but handles the all media types wildcard(*/*). Which is good enough for cases like browser <img> requests or right click -> view image requests(check these tests and #1462).
image/* can be handled in a later enhancement.
Related issues:
- Fixes #1462 and partially addresses #1548
- Deprecates raw-media-types
- Continues the work on https://github.com/PostgREST/postgrest/pull/1546
- Also fixes appending an erroneous
charset=uft-8on binary output. Reported on gitter chat.
Love it!
Couldn't find a test-case for it - maybe it's worth adding: What happens when you do curl -H "Accept: different/type" localhost:3000/rpc/ret_image?name=A? And what happens if you use application/json (or any of the "default" types)? I would expect both to throw.
Another thing is wildcard matching: I agree with you that wildcards in the request can be a later addition. However, I think we really need to allow wildcards in pgrst.accept. So if my function is not ret_image but ret_file (for arbitrary file types), we would at least need to allow pgrst.accept='*/*' now. I think it's rather unlikely that people will have a function that returns just e.g. image/png and not some other file types as well? In all the use-cases I had so far, I had to return different mimetypes.
@wolfgangwalther Thanks! I've added some test cases for default and unknown types.
we would at least need to allow pgrst.accept='/' now.
Oh, that's a new one. On the single mimetype case I've added now, we're in fact accepting */*(the default when no Accept header is sent) and then responding with the particular mimetype.
(I got confused here with the accept keyword, since we also set the type. Maybe we should use pgrst.mime for the single mimetype case)
So with */* we'd be on the multiple mimetype case and we let the user set the mimetype on the function body.
In all the use-cases I had so far, I had to return different mimetypes.
How would the user decide which mimetype to set in the body? For example, if chrome sends text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, should the user parse that in SQL and order the mimetypes according to q parameters?
One thing that is great about being static and specific on the mimetype(single or list) is that it would allow us to document them on our OpenAPI output(check openapi mime types). Wildcards and setting the content type on the function body wouldn't allow us to do that.
we would at least need to allow pgrst.accept='/' now.
Oh, that's a new one. On the single mimetype case I've added now, we're in fact accepting
*/*(the default when no Accept header is sent) and then responding with the particular mimetype.
But that's the other way around from what I suggested, correct? So client requests */* and we have pgrst.accept='single/type' set.
(I got confused here with the
acceptkeyword, since we also set the type. Maybe we should usepgrst.mimefor the single mimetype case)
I think we should keep it consistent: The single mime type case is just the simplest case of content negotation (see below) - it's just a yes/no decision. But since it's still content negotation it should be accept. After negotiating a specific mime type in postgrest (without wildcard), we can set that mimetype as response header Content-Type. This could even be possible in some multi mimetype cases - see below for the application/xhtml+xml case (I'm writing this backwards.. :D).
So with
*/*we'd be on the multiple mimetype case and we let the user set the mimetype on the function body.
Yes. But a simple multiple mimetype case to start with, because for content-negotation you don't even need to make any decision. */* will just accept every request.
In all the use-cases I had so far, I had to return different mimetypes.
How would the user decide which mimetype to set in the body? For example, if chrome sends
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, should the user parse that in SQL and order the mimetypes according to q parameters?
This is the "Content Negotiation" part - this should be done in postgrest, not in the RPC. If we could extend that in the future, so that postgrest somehow tells the RPC (probably via GUC) on which of the possible mimetypes it has acted, that would be great. Example:
- Request is sent with header
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 - The RPC on the endpoint has
pgrst.accept='application/xhtml+xml,*/*'set - Postgrest decides that
application/xhtml+xmlis the best match and setspgrst.accepted='application/xhtml+xml'(note the difference fromaccept- it'sacceptedhere) - If the function had
pgrst.accept='image/png,*/*'instead, postgrest would setpgrst.accepted='*/*'as the best match. pgrst.acceptedwould always be one ofpgrst.accept- so the user writing the RPC would exactly know which values to expect
The multiple mimetypes I referred to are entirely based on content: I am keeping files in the database. That could be images, pdf, ... whatever the user uploads. So once a specific file is requested I know exactly which mimetype to return (the type of the file). But I need to act on any accept header. I guess right now I could rely on the browser always sending */* as part of the accept header, just set pgrst.accept='application/octet-stream' and then set the proper header in the function body.
One thing that is great about being static and specific on the mimetype(single or list) is that it would allow us to document them on our OpenAPI output(check openapi mime types). Wildcards and setting the content type on the function body wouldn't allow us to do that.
I agree that would be great to put in the OpenAPI output. I don't think we should disallow overriding the content-type on the function body, however. Now if my function body overrides the Content-Type and the endpoint could return anything... I think an OpenAPI output like produces: */* would actually be better than e.g. produces: application/octet-stream (referring to my example above).
This seems to be allowed in OpenAPI 3.x:
[...] For responses that match multiple keys, only the most specific key is applicable. e.g. text/plain overrides text/*
I think we should keep it consistent: The single mime type case is just the simplest case of content negotation
I agree, it would be a matter of being clear about the single mime case on the docs.
This is the "Content Negotiation" part - this should be done in postgrest, not in the RPC If we could extend that in the future, so that postgrest somehow tells the RPC (probably via GUC) on which of the possible mimetypes it has acted, that would be great.
Great idea! In fact the parseHttpAccept already orders(q params included) the mimes in a client Accept and returns a list. I think we can pick the best match(first one in the list) for the GUC.
So, right now we have these cases:
- [x] Single mime:
pgrst.accept = image/png(Done) - [ ] Single wildcard:
pgrst.accept = */* - [ ] Multiple mimes:
pgrst.accept = image/png, image/jpeg - [ ] Multiple mimes and wildcard:
pgrst.accept = application/xhtml+xml, text/html, */*;
How about if we handle the 3 pending cases in this way:
- Send the
request.mime(name up for debate) GUC with the best match to let the user set theContent-Typeaccording to his own logic. - The
request.mimeGUC will only be enabled for RPC calls withpgrst.accept. This is to avoid a bit of overhead on normal requests. - On this type of RPC, we'll not set a
Content-Typeby default. This is allowed according to RFC 7231.
A sender that generates a message containing a payload body SHOULD generate a Content-Type header field in that message SHOULD: This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item
I've been checking how to do this in the code and it looks feasible. Also it would have the nice side effect of rejecting an invalid Accept without starting an empty commit on the db(currently happening).
@wolfgangwalther What do you think?
So, right now we have these cases:
Is the "wildcard" you mention here just about a full wildcard or does it extend to "half" wildcards like image/* as well? I assume it does.
Negotiation
I think we can pick the best match(first one in the list) for the GUC.
So, just for clarification, the negotiation algorithm you're suggesting would look like this?
def negotiate_content_type:
for accepted_by_client in parseHttpAccept.ordered_list:
for accepted_by_rpc in pgrst.accept:
if accepted_by_client matches accepted_by_rpc:
return most_specific_of(accepted_by_rpc, accepted_by_client)
throw not acceptable error
Alternative approach
If parseHttpAccept were to return not only the ordered list, but also the q params (does it? I still can't read haskell that easily :/ ), an alternative algorithm could be to parse pgrst.accept with parseHttpAccept as well (so ordering would be handled, even including q as well!) and then:
- create all combinations of accepted client and rpc types
- filter out those that don't match
- calculate
client.q * rpc.qfor every match (q defaults to 1 if not specified) - could even add a factor of 0.5 for "partial matches" because of wildcards, if we wanted to be really smart here
- pick the match with the highest result and break a tie by order in the client header
This would be a bit like the algorithm that apache uses. Since this algorithm would be nicely encapsulated in a function anyway, I think it would not make the overall design more complex (just the negotiation algorithm of course :D).
Return value
I think in any case we can do better than returning either always accepted_by_client or accepted_by_rpc. That's why I put the most_specific_of function in the example, that would do something like the following:
-
if the match is like
client=*/*andrpc=image/png- the return should beimage/png -
if the match is like
client=image/pngandrpc=*/*- the return should beimage/pngas well -
if the match is like
client=image/pngandrpc=image/png- of courseimage/png -
it the match is like
client=*/*andrpc=image/*- the return would beimage/*. Also the other way around, of course. -
and so on
Content-Type
The
request.mimeGUC will only be enabled for RPC calls withpgrst.accept. This is to avoid a bit of overhead on normal requests.
+1. That makes a lot of sense - RPC functions should not be concerned about the returned content-type being application/json or text/csv if postgrest handles that.
On this type of RPC, we'll not set a
Content-Typeby default. This is allowed according to RFC 7231.
What about:
-
if
request.mimedoes not contain a wildcard -> setContent-Typetorequest.mimeby default, but allow override -
if
request.mimedoes contain any kind of wildcard, don't setContent-Typeby default
I think the first one would be a sane default, that would simplify a lot of RPCs, especially those that just accept a single specific mimetype.
I just found this: https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions
It reads:
A [...] function call will be inlined if all of the following conditions are met: [...] the function has no
SETclauses in its definition
That sounds like a show-stopper for the whole SET ...options... idea, because inlining is important for performance of RPCs.
What we need is another way of providing custom postgrest options for individual sql objects. Finding a more general solution here would also allow us to extend the concept here from RPCs to tables.
Maybe something like a config option db-config that points to a table. The schema cache queries could then query this table for additional config options. Optional, of course.
The table schema could be as simple as:
CREATE TABLE pgrst.config (
oid OID,
config JSON
);
in which one could insert:
INSERT INTO pgrst.config VALUES ('ret_image(text)'::regprocedure, '{"accept":"image/png"}')
Not sure if the oid approach is the best. It has the advantage, that renaming database objects is possible. Not sure what happens on CREATE OR REPLACE FUNCTION, though. DROP FUNCTION; CREATE FUNCTION would probably not work without adding a new row to pgrst.config as well.
Another approach is, to have type, schema and name columns, where type could be class, proc, ... - according to the pg_ tables.
That sounds like a show-stopper for the whole SET ...options... idea, because inlining is important for performance of RPCs.
Hmm.. I guess we'd have to measure that to see if the perf we lose is considerable. But really the main use case for pgrst.accept was sending images(and removing raw-media-types) and those need to be cached anyway.
I think the pgrst.accept approach brings a good DX(the table approach is bit more complicated) and is a good enough escape hatch for different media types. Lots of good things(experiments) can come up with this, but first we need to offer a simple interface.
So since flexibility was the original motivation and not max performance, maybe the inlining issue is not an actual show-stopper.
Still, I'd be interesting to clarify the real loss, I see things like The rules set out here are believed to be correct for pg versions between 8.4 and 9.5 on the wiki and it looks like the info there is not conclusive.
(the wiki page is a bit complicated :O.. will need to revisit that later)
Still, I'd be interesting to clarify the real loss, I see things like The rules set out here are believed to be correct for pg versions between 8.4 and 9.5 on the wiki and it looks like the info there is not conclusive.
Forgot to mention that I checked the source. Unchanged in terms of the SET since 9.5. And that makes sense, because once the function is inlined in the main query, it's impossible to set some GUCs just for the inlined part - the context is lost. So this is very unlikely to have changed.
Hmm.. I guess we'd have to measure that to see if the perf we lose is considerable. But really the main use case for
pgrst.acceptwas sending images(and removingraw-media-types) and those need to be cached anyway.
Without inlining all the filters and limits etc. will be applied on the materialized result of the function call. This means, especially with big amounts of data like files, that the filters have to be re-implemented in the function itself. In most cases, this will be a simple PK lookup from a function argument, in that case it will not affect performance, that's true. But using any of the query syntax for filtering or using limit and offset will quickly not be possible anymore.
Re-implementing any of the filter behaviour will be much more complex than the config table :/
I am not sure how this affects performance for resource embedding.
We should be able to run a few tests with the current state of this PR, right? Eh... we would need a branch for comparison that allows inlining properly. Currently it's blocked in general, because of some other conditions not being met.
So maybe we should first make sure that we have queries that allow inlining for regular RPCs - could give a performance improvement as well. And once we have that, we can compare this in the specific use-case of pgrst.accept.
See #1652 for a specific case where the consequences for performance of broken inlining are shown.
TLDR: I'm proposing a solution that will:
- allow us to use the
SET pgrst.acceptconcept - allow us to inline RPC calls
- solve the "accept custom mimetypes for tables and views" problem nicely
This solution is a bit of work to implement - but once we get this done, this should be a major improvement and really useful.
Before going into detail about what we should do, I will outline the inlining problem again, with a couple of examples that show why we need to support it for performance.
Inlining
What is inlining?
Why do we need to make sure that our RPC function calls can be inlined?
When a function call is inlined, the function body is put directly into the main query and the whole query is parsed as one big query. Here's an example:
CREATE FUNCTION search_client (query TEXT) RETURNS SETOF clients
STABLE LANGUAGE SQL AS $$
SELECT * FROM clients WHERE name LIKE '%' || query || '%'
$$;
SELECT * FROM search_client('john doe') LIMIT 10;
Let's ignore for a second that to my knowledge it's not possible to add an index that would cover the name LIKE ... - the same can be done with full text search functions, but that would complicate matters too much here.
If the function call is inlined the query will be treated as:
SELECT * FROM (
SELECT * FROM clients WHERE name LIKE '%' || 'john doe' || '%'
) AS search_client LIMIT 10;
This can be optimized as one query - so the LIMIT will be pushed inside the subquery and once 10 rows are found that match the condition, the query is done. On a very bigtable with many johne does, this will be fast.
Now assume, that the query can not be inlined. One way to do this would be to remove the STABLE from the function definition. In this case the function would VOLATILE and can't be inlined. Now the function call acts as an optimization fence. That means, that the query inside the function is executed, the whole table is scanned for all rows that match the condition, the result is put together (if I remember correctly as an array) and only then is the LIMIT applied, so the first 10 rows of that resultset are taken. On a big table this will be slow.
See #1652 for a case where exactly this is happening. See https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions for all the conditions that must be satisfied for inlining.
Why do we need inlining for RPCs returning a custom mimetype?
Of course there are use-cases where inlining is not important. This is the case when the resultset inside the function call is not further reduced by other paramters. That basically means: Without inlining all the query parameters for filtering and limiting will come with a performance hit for RPCs. However if we pass in a PK column via function argument and query for exactly 1 row, inlining does not matter.
Here are some examples that could use custom mimetypes / accept header, but need query parameters for filtering / limiting:
-
Assume someone needs to output Line-delimited JSON (came up here: https://github.com/PostgREST/postgrest/issues/278#issuecomment-142892374, also see https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON) or any other format not supported out of the box by PostgREST. They could use a custom mimetype and then handle that in an RPC. Of course, in this case they don't want to always transform the full table of records, before applying filtering and limiting. They need inlining.
-
Assume a table that holds files in a
BYTEAcolumn and has multiple other columns. The files are of different mimetype - so we need to call this through an RPC to set theContent-Typeheader correctly. The user should be able to filter on different columns and apply aLIMIT 1to get a valid file back. If those filters were implemented as function arguments, we would need a lot of overloaded functions for all kind of filter combinations, to make up the final query, or we would need to dynamically create the final query. Inlining and pushing through conditions and limits is much easier.
For performance, we absolutely need to support inlining most of the time.
SET on the function definition prevents inlining
It's easy to understand why: Once SET is used on the function definition, the GUC that is set, will have that value for the duration of running the function, so only "inside". Once you inline the query in the main query and treat everything as one, there is only one scope and the SET would apply to the whole query. This would result in different (and possibly unpredictable) behaviour, so those functions can not be inlined.
However, we would be using SET pgrst.accept just as a hint for our schema cache to know which function supports which mimetypes - we wouldn't even need the GUC inside the function body at all.
It seems like a serious waste of performance to prevent inlining just to use the nice way of hinting directly on the function body.
... but - what if we could do both? Read on! :)
Virtual / computed columns to our rescue!
One precondition for PostgREST returning a custom mimetype is always that this must be in the form of a TEXT or BYTEA column - because we need to just pass it right through to the client, without any post-processing (that is the whole point after all...). Not a column, but one column. What if... this column was a computed column instead?
Computed columns are defined through functions that take exactly 1 argument of the base table's type. We can identify all functions that provide computed columns in the schema cache and map them to their base tables / types. This would give us the positive side-effect, that we could add those columns to the OpenAPI output as well!
Now we do the following:
- If the computed column's function has
SET pgrst.acceptand - the request's
Acceptheader is matching that
we return the custom mime-type (just as discussed upwards in this PR and in other issues before). We do this by ignoring the ?select= query parameter and use just the equivalent of ?select=computed_column. The select parameter only makes sense when we post-process with PostgREST anyway. Remember: Custom mimetype is always 1 column returned. The computed column can handle merging multiple columns together. However, even if we're not making use of the select with PostgREST, we can still pass this parameter on to the computed column function via GUC, so that this function can apply the select internally, if there is a need for it.
This would allow us...
... to query tables or views with custom mime-types:
CREATE TABLE clients (
name TEXT,
budget MONEY,
whatever OTHER
);
CREATE FUNCTION yaml (clients) RETURNS TEXT
IMMUTABLE LANGUAGE SQL
SET pgrst.accept='text/vnd.yaml' AS $$
<some yaml transformation happening here>
$$;
Now you can request this with:
GET /clients?budget=gt.BIGNUMBER
Accept: text/vnd.yaml
And get a nice return of clients in yaml format.
... to create "generic" custom output formatters:
Not really generic, but we can add overloads for yaml (clients), yaml (projects), yaml (users)... great flexibility!
... to use the same on RPCs with composite return types ("smart views"):
Those RPCs have to return a base table type, that can be used for computed columns. But they have to anyway, because otherwise inlining will again be prevented. See the inlining conditions mentioned above.
Those can then be queried just like the tables. This also allows to return different output formats for a single RPC, depending on which computed column is used!
Note: RPCs that return any form of scalars are a bit special here. The best solution here is probably to just use SET pgrst.accept on the RPC itself as originally planned. All the filtering on the different columns doesn't apply here, so those filters do not need to be pushed inside anyway. The only thing that would need to be pushed in would be a LIMIT, but that can be added as an argument as well.
Why does this work?
"But.. if we do it like this, then the computed columns functions can not be inlined, right? And we need inlining!".
Yes and no. We don't need inlining for computed columns, because those will do a transformation from a row of the base table to a single value. There is nothing to gain from inlining here, as there can't be any conditions or limits pushed inside. This transformation has to happen in a function call. Most likely those functions would have other characteristics that would prevent inlining anyway...
Note: It's not entirely right that we never need inlining for computed columns, but not in this case. Whenever we use computed columns in the SELECT part of the query (and the whole custom mimetype stuff is really JUST that), we don't need inlining. When we use computed columns in other parts of the query, e.g. WHERE, we can very much benefit from inlining, because that might allow index usage. But this is completely unrelated here.
If we can pull this off, this would be a really powerful feature. We need quite a few parts to play together, to get this right, but in my opinion, it will be well worth it.
A couple more notes:
Note: RPCs that return any form of scalars are a bit special here. The best solution here is probably to just use
SET pgrst.accepton the RPC itself as originally planned. All the filtering on the different columns doesn't apply here, so those filters do not need to be pushed inside anyway. The only thing that would need to be pushed in would be aLIMIT, but that can be added as an argument as well.
It makes sense to implement pgrst.accept not only on computed columns, but also directly on the RPCs. Once for RPCs that return scalars (can't used computed columns here) and also because some scenarios (as demonstrated in this PR, the straightforward "select 1 row by PK" query) are considerably easier to handle with that and don't need inlining anyway.
Regarding inlining of RPCs in general: In the majority of cases, this is not possible right now, with the way the queries are constructed. So allowing inlining can improve performance in the future, but this PR will not lead to a degradation in performance given the way we call RPCs right now.
Therefore, I conclude that the approach in this PR can be continued as-is, without lowering our chances of implementing faster options on top of that. Once we have all the pieces in place, it will be mostly a documentation issue, to tell people how to write RPCs with pgrst.accept (and also without!) properly to have them run fast!
@wolfgangwalther The computed columns approach is genius! :100: :1st_place_medal: :boom: :exploding_head:
So simple and yet so powerful :muscle:.
Couldn't think of a way to also make it work with a RPC returning a scalar. But as you mention, we can be clear as when not to do this on the docs.
So I'll revisit and finish this one later. I'll let you implement your great idea with computed columns on another PR.
Once we finish these we can call https://github.com/PostgREST/postgrest/issues/1548 solved.
I'll let you implement your great idea with computed columns on another PR.
Arghhhh :D
Arghhhh :D
:laughing: I thought you might like to get full credit for that one. But I'll help in reviewing :D
Just another idea here, so we don't forget.
... to create "generic" custom output formatters:
Not really generic, but we can add overloads for
yaml (clients),yaml (projects),yaml (users)... great flexibility!
We can have a true generic solution by implementing the same for custom aggregate functions. Once we find an aggregate that has a finalfunc with SET pgrst.accept, we can call this instead of json_agg. This would allow e.g. generic transforms from json -> yaml or other formats. Maybe CSV output could even be implemented like that, because it would help nicely with adding headers or footers to multiple rows of custom mimetype. Generating a zip file out of multiple files would be another one... ah endless possibilities.
The more I think about it, the more I think we should keep this simple and limit it to supporting aggregate functions only. No SET pgrst.accept on RPCs directly or on virtual columns. Aggregation is really the right stage to handle custom output formats - just like we do the transformation to json or csv at that stage right now, or use string concat for text and binary output formats.
All other approaches have serious drawbacks:
- RPC: No support for tables/views, potentially serious performance issues.
- Virtual Columns: Only work reliable for single entities. Imagine requesting a list of files with different mimetypes - one call to set the header would be overridden by the next and the result would be a mess. At the same time, we can't limit this to return a single row only, this would not allow to implement other mimetypes that support multiple entities (e.g. YAML etc.).
This could be solved by using aggregate functions:
- To qualify an aggregate function needs to have both
SET pgrst.accept=for content negotiation and a return type of eitherTEXTorBYTEA. - To match aggregate functions with routes/database objects (tables, views, stored procedures) the type of the first argument is taken:
- If it's a composite type as an argument, the aggregate will be matched with the table/view (if it's a table's row type) and any RPC that returns this type. To be able to pass the row to the aggregate, we need to change our select part of the query slightly to return the full row as one column like this:
SELECT table FROM schema.table. This will not support any?select=in the query string, because the aggregation function makes the decision which columns from the row to take. - If it's a scalar type it will match with any RPC returning that type.
?select=is not supported anyway in this case. - If it's
JSONorJSONBany table/view/RPC is supported. This uses the regular query including?select=and embedding support and allows for generic handlers (e.g. to transform to YAML). The aggregate is called somehow like this:SELECT yaml_agg(to_json(postgrest_t))
- If it's a composite type as an argument, the aggregate will be matched with the table/view (if it's a table's row type) and any RPC that returns this type. To be able to pass the row to the aggregate, we need to change our select part of the query slightly to return the full row as one column like this:
This approach does support everything we need:
- Tables, views and RPCs
- Inlining of RPCs
- Works with RPCs that return scalar values, too
- Throwing errors when multiple entities are requested for a mimetype that doesn't support multiple
- Container mimetypes for multiple entities (e.g. YAML or even a ZIP-wrapper around multiple files...)
- Generic mimetype handlers (e.g. better CSV support)
Another benefit is that we don't need the whole "select one output column only" magic. Even from a table you will be able to just set your Accept header and the aggregate function takes care of choosing the right column. Much better handling in code and the API, I think.
One advantage of not implementing pgrst.accept for RPCs would be, that we could allow setting it to enable Content-Type handling (so the other way around, so parsing the body on e.g. POST requests). Basically, setting pgrst.accept on
- an aggregate function would be for output
- a regular function would be for input
Those functions would then have to take BYTEA or TEXT and return either JSON / JSONB to take the current path. Or they could return the target's table row type to avoid the intermediate conversion to json. An INSERT query should be straight-forward in that case, I think. This would allow us to parse multipart form data requests, I think. So uploading files could be easily possible, too!
No SET pgrst.accept on RPCs directly
Hm, so under your new proposal. I wouldn't be able to have a function that does some analysis in pl/r and outputs a png image with a plot(like here)? I have other similar use cases and really those are the ones that excited me the most. The simplicity of just adding a GUC to the function and get a result through the URL was most enticing to me.
No SET pgrst.accept on RPCs directly
Hm, so under your new proposal. I wouldn't be able to have a function that does some analysis in pl/r and outputs a png image with a plot(like here)? I have other similar use cases and really those are the ones that excited me the most. The simplicity of just adding a GUC to the function and get a result through the URL was most enticing to me.
Hm. For those cases, I would probably add a generic BYTEA handler like this:
CREATE FUNCTION extra.raw_handler (state BYTEA, next BYTEA) RETURNS BYTEA
SET `pgrst.accept="*/*"`
LANGUAGE plpgsql AS $$
BEGIN
IF state IS NOT NULL THEN
-- throw error, because bytea endpoints should only return one row!
END IF;
IF content-type-GUC-is-not-set THEN
-- throw error, too - we need a content type.
END IF;
RETURN next;
END$$;
CREATE AGGREGATE extra.accept_all (BYTEA) (
SFUNC = extra.raw_handler,
STYPE = BYTEA
);
This would basically replace the whole raw-media-types stuff we currently have for RPCs. Once you have that, you don't even need to use SET on the RPC anymore - but can just return BYTEA and set the content-type header. Done.
Hm, seems really hacky tbh. The pgrst.accept = */* doesn't tell that we're producing an image/png. IIRC, we agreed it'd be better that the produced mime types would be static, that way they can be documented under openapi or other spec(request.spec could contain this info as well).
Although I must say that I have not fully grasped your proposal about aggregates. I'll revisit this one later.
The
pgrst.accept = */*doesn't tell that we're producing animage/png.
That's because the handler is really generic. You can now use it with many different RPCs that return BYTEA. Some might return image/png. Some might return something else.
IIRC, we agreed it'd be better that the produced mime types would be static, that way they can be documented under openapi or other spec(request.spec could contain this info as well).
I remember it as "if we have a static mimetype, we can return it in the spec". For me, that's definitely not a goal, because that would kill my primary use-case: Serving files of different mimetypes from the same endpoint. But of course, it's a nice feature for those cases, when we do have a static mimetype.
I tend to focus much more on the generic and/or multi-mimetype cases for content-negotiation. It's good that you brought up your example above - this is not really content-negotiation, though. Content-negotiation is about how to represent the same entity in a different format. So the RPC should not really have a "fixed" response type, because then... I can't negotiate anything anymore.
Going back to your example of a plot for some data, another way of doing it would be more RESTy with content-negotiation, like this:
CREATE VIEW public.my_cool_dataset AS
SELECT x, y FROM private.data;
CREATE FUNCTION extra.plot_png_handler_transition (state public.my_cool_dataset[], next public.my_cool_dataset)
RETURNS public.my_cool_dataset[]
LANGUAGE SQL AS $$
SELECT pg_catalog.array_append(state, next);
$$;
CREATE FUNCTION extra.plot_png_handler_final (data public.my_cool_dataset[])
RETURNS BYTEA
SET pgrst.accept='image/png'
LANGUAGE plr AS $$
# the data argument is now an array of (x, y) records
# some crazy plotting
# set header to content/type image/png (or rely on PostgREST to do it automatically in this static case?)
# return image/png bytestream
$$;
CREATE AGGREGATE extra.plot_png_handler (public.my_cool_dataset) (
STYPE = public.my_cool_dataset[],
SFUNC = extra.plot_handler_transition,
FFUNC = extra.plot_handler_final
);
This might be a bit more code, but now you can do cool stuff:
- To have a json representation of the dataset:
GET /my_cool_dataset Accept: application/json - To have a plot of the same data:
GET /my_cool_dataset Accept: image/png - Notice you suddenly have all the PostgREST filtering abilities available for your plot, too. Just add a few in the query and the input to your
plot_png_handleraggregate will be filtered already.
I hope this shows the power of the aggregate approach.
Although I must say that I have not fully grasped your proposal about aggregates.
Query-wise, once the image/png mimetype is negotiated for, the following line:
https://github.com/PostgREST/postgrest/blob/c2c7bbe9dd1e2908160827d7f90cacc589c33264/src/PostgREST/Query/SqlFragment.hs#L161
will just change to:
coalesce(extra.plot_png_handler(_postgrest_t.pgrst_record), '[]')::character varying
with pgrst_record being the full row of type public.my_cool_dataset. Can't pass in the anonymous result of ?select=, because that wouldn't match with the type. We'd need to return SELECT my_cool_dataset AS pgrst_record FROM public.my_cool_dataset instead of our usual select for that.
When writing the last comment, some ideas for enhancement came to mind. I'll leave them here to not forget, because this is currently the best place to do so:
- Aggregates can be passed additional arguments, that are available in the final func as extra arguments. We could use those to provide all kinds of request data to the handler, if we were to decide to implement the non-guc-interface.
- One of those could be the content-type that was negotiated for - useful for handlers that act on multiple mimetypes, possibly. Imagine you could make your plot handler
image/*and then transform the image to either png, jpeg, bmp, whatever before returning from the handler. The client could then request the representation they can deal with best. - Another one could be a json map of all the mimetype parameters that were passed
Any advances here?
After
- #2268
- #2300
... this would be the last step to support XML (#2188)
@fjf2002 My original idea was pretty simple(just adding set pgrst.accept = 'type/media-type' to a function) but Wolfgang found(comment above) that it would bring perf problems because of function inlining.
Since then I've lost track of the implementation to be honest but I do agree with the aggregates proposal. If you'd like to give this a shot in another PR I'll gladly review it.
I am not 100% sure if I understand everything correctly;
- My use case is: an RPC with an unnamed xml param and scalar xml return value, to implement a SOAP endpoint.
- Returning xml already works with #2268 - but
Content-Type: text/xmlis still missing in the response.
Isn't the aggregate stuff overkill for my use case?
~~Of course I can remain setting the response header manually in the RPC using~~
~~PERFORM set_config('response.headers', '[{"content-type": "text/xml; charset=utf-8"}]', true);~~
EDIT: That was wrong. The correct answer is: As a workaround, I add Accept: text/xml in the reverse proxy, when making the request.
Of course I can remain setting the response header manually in the RPC using PERFORM set_config('response.headers', '[{"content-type": "text/xml; charset=utf-8"}]', true);
@fjf2002 Ah, actually that already solves your use case right? This proposal would have more or less the same result. Or what's missing?
@fjf2002 Ah, actually that already solves your use case right?
~~Yes. It's no real problem for me to stay with the set_config('response.headers'... line.~~
EDIT: Well, it seems that I have mixed up different stuff. The correct answer is: As a workaround, I add Accept: text/xml in the reverse proxy, when making the request.
Since doing SET on the function would bring perf problems, how about annotating the function through our config. A similar idea was proposed on https://github.com/PostgREST/postgrest/issues/2028#issuecomment-973832177.
So with our in-db config:
ALTER ROLE authenticator SET pgrst.schema.ret_xml.accept = "text/xml"
Env var:
PGRST_<SCHEMA>_<func>_accept = 'text/xml'
Config file:
app.<schema>.<func>.accept = 'text/xml'
how about annotating the function through our config. A similar idea was proposed on #2028 (comment).
Yes, that seems to be a good idea. That would also solve the similar case with the img howto.
@fjf2002 Cool. If you'd like to give it a shot I'll gladly review it.
Another thing that this could provide is the ability to override the default mime type for tables/views as well(can be a later improvement), this way we could solve the problem in https://github.com/PostgREST/postgrest/discussions/2338