Pyrseas icon indicating copy to clipboard operation
Pyrseas copied to clipboard

dbtoyaml - analyze functions arguments and returns type

Open excavador opened this issue 6 years ago • 7 comments

Would be very nice to have information about function arguments and types.

For instance

create table foo(
    a int unique
);

create table bar(
    b int primary key references foo(a)
);

create or replace function f(value foo, c int) returns bar as $$
SELECT ROW(value.*)::bar;
$$ language sql;

Actual output (pyrseas 0.8.0):

schema public:
  ...
  function f(value foo, c integer):
    language: sql
    owner: postgres
    returns: bar
    source: '

      SELECT ROW(value.*)::bar;

      '
  owner: postgres
  privileges:
  - PUBLIC:
    - all
  - postgres:
    - all
...

Desired output:

schema public:
  ...
  function f(public.foo, integer):
    arguments:
      - name: value
         schema: public
         table: foo
      - name: c
         type: integer
    returns:
      - schema: public 
         table: bar
    language: sql
    owner: postgres
    returns: bar
    source: '

      SELECT ROW(value.*)::bar;

      '
  owner: postgres
  privileges:
  - PUBLIC:
    - all
  - postgres:
    - all
...

or something like that

excavador avatar Feb 21 '18 21:02 excavador

I suppose my team can prepare patch for that. I need to know, it's possible to merge it to upstream (in theory) or we should consider forking. Thank you.

excavador avatar Feb 21 '18 21:02 excavador

Currently the info on the arguments is from pg_get_function_identity_arguments or pg_get_function_arguments, which we try to parse in a less than satisfactory way. The information is available in pg_proc.proargtypes as an oidvector, so if you query the catalog for function f above, you'll get something like 214577 23. You can get human-understandable types by issuing a query such as SELECT 214577::regtype, 23::regtype; which will return the tuple public.foo, integer but you'd want to retrieve this info together with the main query for all functions, not individually for each function (particularly if you have over 1500 functions as mentioned in your previous issue).

While working on r0.8, I experimented with int2vectors (present in pg_index and similar to oidvectors) and ARRAY functions to see if I could get index column/expression info without having to parse the output of pg_get_expr, but couldn't find a suitable solution. Admittedly, dealing with the proargtypes oidvector may be easier.

From the viewpoint of Pyrseas goals, in general we only want to add to the YAML output info that is needed or useful for SQL generation by yamltodb. Usually, the YAML parallels the output from pg_dump -s. It doesn't seem that explicitly separating datatype info for each argument would be useful, since the info is present in the current function signature (arguably, with less detail for record types). However, if a change could be made to show this info, without any significant performance degradation, and without impacting yamltodb, I would be amenable to merging it.

There is one thing that appears to need correction in any case. Function f depends on table foo (technically on the type foo) but our current YAML does not indicate this. Under certain circumstances, presumably f could be created before foo and that would cause a PG error. @dvarrazzo, do you agree?

jmafc avatar Feb 22 '18 02:02 jmafc

Currently the info on the arguments is from pg_get_function_identity_arguments or pg_get_function_arguments, which we try to parse in a less than satisfactory way. The information is available in pg_proc.proargtypes as an oidvector, so if you query the catalog for function f above, you'll get something like 214577 23. You can get human-understandable types by issuing a query such as SELECT 214577::regtype, 23::regtype; which will return the tuple public.foo, integer but you'd want to retrieve this info together with the main query for all functions, not individually for each function (particularly if you have over 1500 functions as mentioned in your previous issue).

I think it would be way easier to fetch the map oid -> (schema, name) for all the types in the db and use it to lookup in python, instead of trying to expand it in the query. Maybe this would be poor as a general method, but pg_proc has a lot of subtleties/inconsistencies (see proargtypes, proallargtypes in the docs) that makes just easier to throw a couple of ifs in Python.

There is one thing that appears to need correction in any case. Function f depends on table foo (technically on the type foo) but our current YAML does not indicate this. Under certain circumstances, presumably f could be created before foo and that would cause a PG error. @dvarrazzo, do you agree?

pg_depend contains dependencies between functions and types so topological sort should make sure types are created before the functions. I'm afraid I can't remember how we make use of this information in yaml though...

dvarrazzo avatar Feb 22 '18 12:02 dvarrazzo

There is one thing that appears to need correction in any case. Function f depends on table foo (technically on the type foo) but our current YAML does not indicate this. Under certain circumstances, presumably f could be created before foo and that would cause a PG error.

For me to understand the process correctly - wouldn't the same apply to type bar? When I try to follow the yaml generation - it does output depends_on for anything that is not inferable here. The function argument types are inferred here and the return type here so these not being listed in the yaml seams to be expected?

tbussmann avatar Feb 22 '18 14:02 tbussmann

Thank you. Seems like we need to parse arguments definition, it would be enough But dependencies bug is crucial.

excavador avatar Feb 22 '18 14:02 excavador

Daniele, because of the dependency cycle issue that I had recently while working on issue #176, as you know I was revisiting Post-Facto and they did have caches of "oft-referenced maps": languages, schemas (and reverse map), types, PG classes, operators, functions and pg_catalog relnames. Those would be useful, but to really take advantage of them would require quite a bit of refactoring too.

Tobias, yes, you're right: function f should also depend on table bar (since currently tables and row types based on those tables are treated equivalently).

jmafc avatar Feb 22 '18 15:02 jmafc

@excavador If I understand correctly, you and your team will deal with parsing arguments (or have dones so already) and issue is only open to address the apparently missing dependency that was pointed by @tbussmann. In case you haven't seen it already, please take a look at https://pyrseas.wordpress.com/2018/09/12/the-future-of-pyrseas-revisited/ . It would help if someone could construct a simple test to demonstrate the problem.

jmafc avatar Sep 14 '18 00:09 jmafc