pgsql-ogr-fdw
pgsql-ogr-fdw copied to clipboard
Expose ogr_fdw_info functions via SQL
I've found it useful to be able to introspect OGR datasources as the PostgreSQL server can see it — SQL equivalents of ogr_fdw_info commands.
-
Add a function to list available OGR layers from an FDW server:
SELECT ogr_fdw_layers('myserver');ogr_fdw_layers ---------------- Cities Countries (2 rows) -
Add a function to get the
CREATE FOREIGN TABLESQL for a particular OGR layer, the same asIMPORT FOREIGN SCHEMAuses. This can help for issues where OGR is reporting column types wrongly, or some columns aren't needed but it's mostly correct.ogr_fdw_table_sql(server_name, ogr_layer_name, table_name=NULL, launder_column_names=TRUE, launder_table_name=TRUE)SELECT ogr_fdw_table_sql('myserver', 'pt_two');ogr_fdw_table_sql --------------------------------- CREATE FOREIGN TABLE "pt_two" ( fid integer, "geom" geometry(Point, 4326), "name" varchar, "age" integer, "height" real, "birthdate" date ) SERVER "myserver" OPTIONS (layer 'pt_two');
Not sure on preferences whether you'd prefer the functions to take OIDs (eg: SELECT ogr_fdw_layers(myserver);) instead of names; and suggestions also very welcome on improvements for naming or docs.
I think this is useful but I'm +0 on the API.
- Layer listing is good.
- Need some way to play with connection strings to find one that works.
- Pretty sure that Oid/Regclass is what we want if we're referencing a database object, otherwise for strangely named things you end up wrestling at cross purposes with escape sequences.
- How crazy would just having
ogr_fdw_info(server text, layer text default null) returns textexact analogues to CLI? - I feel like a missing key function is
ogr_read(server text, layer text default null) returns setof(record)but you don't have to do that one, I'm just putting it here for completeness.
Design goal is basically to provide the affordances of the existing software, but removing the need for locally installed software, yes?
Design goal is basically to provide the affordances of the existing software, but removing the need for locally installed software, yes?
Yes, so you don't need ogr-fdw built+installed on both the client and the PG server. And also to fix random schema issues by getting the SQL since ogr_fdw_info doesn't support launder options.
I think this is useful but I'm +0 on the API.
👍 Happy to iterate.
- Pretty sure that Oid/Regclass is what we want if we're referencing a database object, otherwise for strangely named things you end up wrestling at cross purposes with escape sequences.
Yeah, that's what I was leaning towards.
- Need some way to play with connection strings to find one that works.
- How crazy would just having ogr_fdw_info(server text, layer text default null) returns text exact analogues to CLI?
I feel like it might get confusingly overloaded?
ogr_fdw_info(datasource text) returns text; -- CREATE SERVER...
ogr_fdw_info(server oid, layer text) returns text; -- CREATE FOREIGN TABLE...
How about:
ogr_fdw_info_server(ogr_datasource text, [...options]) returns text; -- CREATE SERVER...
ogr_fdw_info_table(server oid, layer text, [...options]) returns text; -- CREATE FOREIGN TABLE...
ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server
- I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.
I guess there's some potential security questions here too if we're executing from ogr datasource strings? Since calling server-side functions is potentially a different class of permissions than doing CREATE SERVER/CREATE FOREIGN TABLE; and GDAL is making remote network requests.
How about:
ogr_fdw_info_server(ogr_datasource text, [...options]) returns text; -- CREATE SERVER... ogr_fdw_info_table(server oid, layer text, [...options]) returns text; -- CREATE FOREIGN TABLE... ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server
This latter won't work so great I don't thing, as the text form will shadow the oid form (since most people access the oid type by feeding in a text string and letting the auto-cast do its magic.
- I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.
I guess there's some potential security questions here too if we're executing from ogr datasource strings? Since calling server-side functions is potentially a different class of permissions than doing
CREATE SERVER/CREATE FOREIGN TABLE; and GDAL is making remote network requests.
Yeah, I hadn't thought about the extent to which the FDW API was providing some free security cover. I mean, we could just make the ogr_read be a superuser-only function, people can grant security definer on it if they want to swim naked. (BTW, you don't have to write ogr_read, that's on my personal todo/wish list.
ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW serverThis latter won't work so great I don't thing, as the text form will shadow the oid form (since most people access the oid type by feeding in a text string and letting the auto-cast do its magic.
I kinda assumed it'd be the difference between:
SELECT ogr_fdw_info("myserver"); -- oid
SELECT ogr_fdw_info(myserver); -- oid
SELECT ogr_fdw_info('WFS:https://wfs.example.com/wfs'); -- text
But maybe there's some special TIL typing coercion that happens by default?
I mean, SELECT lower('foo') and SELECT lower("foo")/lower(foo) are fundamentally different, that's SQL.101?
(I'll accept "it's still too confusing, don't do it" 😁 )
I mean, we could just make the ogr_read be a superuser-only function
ok. Maybe ogr_fdw_info() & ogr_fdw_layers() too? They're still doing remote network requests.
(I'll accept "it's still too confusing, don't do it" 😁 )
Nope I just have my head up my ass. Ignore me and procede.
How about:
ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server
What is the expected output of ogr_fdw_info_layers? If it's simply a set of layer names, I think output being a
SETOF text
makes more sense.
I've been somewhat bogged down in other stuff, but I'll get back to finishing this soon hopefully :-)
If it's simply a set of layer names, I think output being a
SETOF textmakes more sense.
TIL SETOF text for returning multiple rows of a single column; my previous understanding was you needed SETOF record for anything returning multiple rows.
So, yes it does 👍
SETOF text or SETOF anytype will return multiple records and will just have the function as the name of column if no alias is provided.
The main difference is SETOF record or RETURNS TABLE would allow multiple columns per row but you have to deal with that messy business of using RETURNS TABLE (defining the columns) or using OUT params, which is kinda pointless if you are only returning one column.
Take for example the postgis function ST_SubDivide, it returns a SETOF geometry and you use it like any other table function.
CREATE OR REPLACE FUNCTION st_subdivide(
geom geometry,
maxvertices integer DEFAULT 256,
gridsize double precision DEFAULT '-1.0'::numeric)
RETURNS SETOF geometry
LANGUAGE 'c'
COST 5000
IMMUTABLE STRICT PARALLEL SAFE
ROWS 1000
AS '$libdir/postgis-3', 'ST_Subdivide'
;
But that said, I don't think there is any performance penalty doing it your way, just that you have a slightly longer definition.
Getting back around to this...
I kinda assumed it'd be the difference between:
SELECT ogr_fdw_info("myserver"); -- oid SELECT ogr_fdw_info(myserver); -- oid SELECT ogr_fdw_info('WFS:https://wfs.example.com/wfs'); -- textBut maybe there's some special TIL typing coercion that happens by default?
Unsurprisingly, it turns out I am wrong 😁
rcoup=# SELECT ogr_fdw_info_layers(myserver);
ERROR: column "myserver" does not exist
And while there's various regclass/etc alias types for tables, users, types, functions, etc there isn't one for FDW servers. I suspect that's probably a useful addition to PG itself, but it doesn't help right now.
Only similar thing I could find is dblink_connect(text connstr) which takes either a DB connection string or "the name of an existing foreign server".
What would you prefer for the ogr_fdw_info_layers()?
a. take a text arg which uses the arg to lookup a foreign server by name, and falls back to using it as an OGR datasource string b. have two separate functions?
I guess we must accept the server name as text...
@rcoup @pramsey just checking where we are with this.
Kinda stalled out. Almost there functionally I think? It's just less coolio than I had hoped ;)
Kinda stalled out. Almost there functionally I think? It's just less coolio than I had hoped ;)
What cool thing were you hoping for? Taking server name as not text but a real fdw_server object.
or the other cool thing you spoke about ogr_read, which I assume is the cool thing I'm looking for which is similar to a dblink like function that takes a connection and and sql and returns a set of RECORD or a JSON object
I know you frowned on that cause you really wanted to be able for it to unravel the structure, but that feature of undefined I find pretty useful to handle types that don't map cleanly to PostgreSQL or where my app expects the same structure which is gasp JSON, cause most web apps output JSON anyway these days.. So I would want a function that can output JSON anyway even if you had a function that returned a completely structured output.
I use dblink a lot to connect to other postgresql servers since postgres_fdw lacks that functionality to do adhoc queries. Would be cool to be able to pass adhoc ogr_fdw_sql statements like you said. But I'd probably want it to be called something like
ogr_read_as_table, ogr_read_as_json
But that said, that wish should probably be separate from this pull request anyway to prevent scope creep.
I guess I was hoping to be able to address things as regclass rather than having to lookup names. You now, just seems yucky.