pgsql-ogr-fdw icon indicating copy to clipboard operation
pgsql-ogr-fdw copied to clipboard

Expose ogr_fdw_info functions via SQL

Open rcoup opened this issue 1 year ago • 8 comments

I've found it useful to be able to introspect OGR datasources as the PostgreSQL server can see it — SQL equivalents of ogr_fdw_info commands.

  1. Add a function to list available OGR layers from an FDW server:

    SELECT ogr_fdw_layers('myserver');
    
     ogr_fdw_layers
    ----------------
     Cities
     Countries
    
    (2 rows)
    
  2. Add a function to get the CREATE FOREIGN TABLE SQL for a particular OGR layer, the same as IMPORT FOREIGN SCHEMA uses. This can help for issues where OGR is reporting column types wrongly, or some columns aren't needed but it's mostly correct.

    ogr_fdw_table_sql(server_name, ogr_layer_name, table_name=NULL, launder_column_names=TRUE, launder_table_name=TRUE)

    SELECT ogr_fdw_table_sql('myserver', 'pt_two');
    
            ogr_fdw_table_sql
    ---------------------------------
    CREATE FOREIGN TABLE "pt_two" (
      fid integer,
      "geom" geometry(Point, 4326),
      "name" varchar,
      "age" integer,
      "height" real,
      "birthdate" date
    ) SERVER "myserver"
    OPTIONS (layer 'pt_two');
    

Not sure on preferences whether you'd prefer the functions to take OIDs (eg: SELECT ogr_fdw_layers(myserver);) instead of names; and suggestions also very welcome on improvements for naming or docs.

rcoup avatar Aug 08 '24 17:08 rcoup

I think this is useful but I'm +0 on the API.

  • Layer listing is good.
  • Need some way to play with connection strings to find one that works.
  • Pretty sure that Oid/Regclass is what we want if we're referencing a database object, otherwise for strangely named things you end up wrestling at cross purposes with escape sequences.
  • How crazy would just having ogr_fdw_info(server text, layer text default null) returns text exact analogues to CLI?
  • I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.

Design goal is basically to provide the affordances of the existing software, but removing the need for locally installed software, yes?

pramsey avatar Aug 08 '24 18:08 pramsey

Design goal is basically to provide the affordances of the existing software, but removing the need for locally installed software, yes?

Yes, so you don't need ogr-fdw built+installed on both the client and the PG server. And also to fix random schema issues by getting the SQL since ogr_fdw_info doesn't support launder options.

I think this is useful but I'm +0 on the API.

👍 Happy to iterate.

  • Pretty sure that Oid/Regclass is what we want if we're referencing a database object, otherwise for strangely named things you end up wrestling at cross purposes with escape sequences.

Yeah, that's what I was leaning towards.

  • Need some way to play with connection strings to find one that works.
  • How crazy would just having ogr_fdw_info(server text, layer text default null) returns text exact analogues to CLI?

I feel like it might get confusingly overloaded?

ogr_fdw_info(datasource text) returns text; -- CREATE SERVER... 
ogr_fdw_info(server oid, layer text) returns text; -- CREATE FOREIGN TABLE...

How about:

ogr_fdw_info_server(ogr_datasource text, [...options]) returns text; -- CREATE SERVER...
ogr_fdw_info_table(server oid, layer text, [...options]) returns text; -- CREATE FOREIGN TABLE...

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server
  • I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.

I guess there's some potential security questions here too if we're executing from ogr datasource strings? Since calling server-side functions is potentially a different class of permissions than doing CREATE SERVER/CREATE FOREIGN TABLE; and GDAL is making remote network requests.

rcoup avatar Aug 08 '24 23:08 rcoup

How about:

ogr_fdw_info_server(ogr_datasource text, [...options]) returns text; -- CREATE SERVER...
ogr_fdw_info_table(server oid, layer text, [...options]) returns text; -- CREATE FOREIGN TABLE...

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

This latter won't work so great I don't thing, as the text form will shadow the oid form (since most people access the oid type by feeding in a text string and letting the auto-cast do its magic.

  • I feel like a missing key function is ogr_read(server text, layer text default null) returns setof(record) but you don't have to do that one, I'm just putting it here for completeness.

I guess there's some potential security questions here too if we're executing from ogr datasource strings? Since calling server-side functions is potentially a different class of permissions than doing CREATE SERVER/CREATE FOREIGN TABLE; and GDAL is making remote network requests.

Yeah, I hadn't thought about the extent to which the FDW API was providing some free security cover. I mean, we could just make the ogr_read be a superuser-only function, people can grant security definer on it if they want to swim naked. (BTW, you don't have to write ogr_read, that's on my personal todo/wish list.

pramsey avatar Aug 12 '24 18:08 pramsey

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string
ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

This latter won't work so great I don't thing, as the text form will shadow the oid form (since most people access the oid type by feeding in a text string and letting the auto-cast do its magic.

I kinda assumed it'd be the difference between:

SELECT ogr_fdw_info("myserver"); -- oid
SELECT ogr_fdw_info(myserver); -- oid
SELECT ogr_fdw_info('WFS:https://wfs.example.com/wfs'); -- text

But maybe there's some special TIL typing coercion that happens by default?

I mean, SELECT lower('foo') and SELECT lower("foo")/lower(foo) are fundamentally different, that's SQL.101?

(I'll accept "it's still too confusing, don't do it" 😁 )

I mean, we could just make the ogr_read be a superuser-only function

ok. Maybe ogr_fdw_info() & ogr_fdw_layers() too? They're still doing remote network requests.

rcoup avatar Aug 12 '24 21:08 rcoup

(I'll accept "it's still too confusing, don't do it" 😁 )

Nope I just have my head up my ass. Ignore me and procede.

pramsey avatar Aug 12 '24 23:08 pramsey

How about:

ogr_fdw_info_layers(ogr_datasource text) returns setof(record); -- layer list for connection string ogr_fdw_info_layers(server oid) returns setof(record); -- layer list for existing FDW server

What is the expected output of ogr_fdw_info_layers? If it's simply a set of layer names, I think output being a

SETOF text

makes more sense.

robe2 avatar Sep 12 '24 19:09 robe2

I've been somewhat bogged down in other stuff, but I'll get back to finishing this soon hopefully :-)

If it's simply a set of layer names, I think output being a SETOF text makes more sense.

TIL SETOF text for returning multiple rows of a single column; my previous understanding was you needed SETOF record for anything returning multiple rows.

So, yes it does 👍

rcoup avatar Sep 12 '24 21:09 rcoup

SETOF text or SETOF anytype will return multiple records and will just have the function as the name of column if no alias is provided.

The main difference is SETOF record or RETURNS TABLE would allow multiple columns per row but you have to deal with that messy business of using RETURNS TABLE (defining the columns) or using OUT params, which is kinda pointless if you are only returning one column.

Take for example the postgis function ST_SubDivide, it returns a SETOF geometry and you use it like any other table function.

CREATE OR REPLACE FUNCTION st_subdivide(
	geom geometry,
	maxvertices integer DEFAULT 256,
	gridsize double precision DEFAULT '-1.0'::numeric)
    RETURNS SETOF geometry 
    LANGUAGE 'c'
    COST 5000
    IMMUTABLE STRICT PARALLEL SAFE 
    ROWS 1000

AS '$libdir/postgis-3', 'ST_Subdivide'
;

But that said, I don't think there is any performance penalty doing it your way, just that you have a slightly longer definition.

robe2 avatar Oct 03 '24 00:10 robe2

Getting back around to this...

I kinda assumed it'd be the difference between:

SELECT ogr_fdw_info("myserver"); -- oid
SELECT ogr_fdw_info(myserver); -- oid
SELECT ogr_fdw_info('WFS:https://wfs.example.com/wfs'); -- text

But maybe there's some special TIL typing coercion that happens by default?

Unsurprisingly, it turns out I am wrong 😁

rcoup=# SELECT ogr_fdw_info_layers(myserver);
ERROR:  column "myserver" does not exist

And while there's various regclass/etc alias types for tables, users, types, functions, etc there isn't one for FDW servers. I suspect that's probably a useful addition to PG itself, but it doesn't help right now.

Only similar thing I could find is dblink_connect(text connstr) which takes either a DB connection string or "the name of an existing foreign server".

What would you prefer for the ogr_fdw_info_layers()?

a. take a text arg which uses the arg to lookup a foreign server by name, and falls back to using it as an OGR datasource string b. have two separate functions?

rcoup avatar Jan 16 '25 11:01 rcoup

I guess we must accept the server name as text...

pramsey avatar Jan 16 '25 19:01 pramsey

@rcoup @pramsey just checking where we are with this.

robe2 avatar Oct 05 '25 13:10 robe2

Kinda stalled out. Almost there functionally I think? It's just less coolio than I had hoped ;)

pramsey avatar Oct 06 '25 17:10 pramsey

Kinda stalled out. Almost there functionally I think? It's just less coolio than I had hoped ;)

What cool thing were you hoping for? Taking server name as not text but a real fdw_server object.

or the other cool thing you spoke about ogr_read, which I assume is the cool thing I'm looking for which is similar to a dblink like function that takes a connection and and sql and returns a set of RECORD or a JSON object

I know you frowned on that cause you really wanted to be able for it to unravel the structure, but that feature of undefined I find pretty useful to handle types that don't map cleanly to PostgreSQL or where my app expects the same structure which is gasp JSON, cause most web apps output JSON anyway these days.. So I would want a function that can output JSON anyway even if you had a function that returned a completely structured output.

I use dblink a lot to connect to other postgresql servers since postgres_fdw lacks that functionality to do adhoc queries. Would be cool to be able to pass adhoc ogr_fdw_sql statements like you said. But I'd probably want it to be called something like

ogr_read_as_table, ogr_read_as_json

But that said, that wish should probably be separate from this pull request anyway to prevent scope creep.

robe2 avatar Oct 07 '25 12:10 robe2

I guess I was hoping to be able to address things as regclass rather than having to lookup names. You now, just seems yucky.

pramsey avatar Oct 07 '25 19:10 pramsey