graphql-engine icon indicating copy to clipboard operation
graphql-engine copied to clipboard

rfc (wip): parameterized queries

Open 0x777 opened this issue 2 years ago • 7 comments

Parameterized Queries

Motivation

While graphql-engine's auto-generated API is reasonably expressive, it doesn't really compete with the expressivity of a query language that a database provides. You can express a lot more queries through SQL on Postgres, use a lot more operators on Postgres than through graphql-engine's Postgres API.

We want to provide our users an ability to fallback to a databases's query language when they run into cases where graphql-engine's API is not adequate. Currently graphql-engine addresses this by exposing GraphQL API over views and functions. For example, a view can be defined as follows:

create view page_items as
  select title, content from articles
  union all
  select title, description from alerts

and graphql-engine provides this API:

query {
  page_items(where: {..} limit: .. offset: ..) {
    ...
  }
}

and similarly for the following function

CREATE FUNCTION search_articles(search text)
RETURNS SETOF articles AS $$
  SELECT *
  FROM articles
  WHERE
    title ilike ('%' || search || '%')
    OR content ilike ('%' || search || '%')
$$ LANGUAGE sql STABLE;

graphql-engine provides this API:

query {
  search_articles(args: {search: "something"} where: .. limit: .. offset: ..) {
    ...
  }
}

While the above approach works, it has a couple of shortcomings:

1. Requirement of DDL privileges

Our users need DDL privileges to create these new functions or views. This privilege is not easily granted, particularly in an enterprise setup. The less we require of a database, the easier it would be for our users to add graphql-engine in their stack.

2. Restrictions on the function return type

The functions that graphql-engine can support currently need to return a 'table' or 'setof table' that's already tracked. There are couple of issues here:

  1. Does not work for databases (such as SQL Server) that do not have the notion of functions returning named table types.

  2. Does not support functions that do not return a table. For example, if the above search_articles function only returns title and content, graphql-engine cannot track it.

    CREATE FUNCTION search_page_items(search text)
    RETURNS SETOF TABLE (title text, content text) AS $$
      SELECT title, content
      FROM articles
      WHERE
        title ilike ('%' || search || '%')
        OR content ilike ('%' || search || '%')
    $$ LANGUAGE sql STABLE;
    

    To work around this, users go through this elaborate process:

    1. Create a table with the schema of the function return type
    2. Track the table
    3. Change the definition of the function to return the newly created table
    4. Track the function

    which is terrible DX

To summarize the motivation is this:

We want our users to use the full query language ability of a database to extend graphql-engine's auto-generated API while addressing the shortcomings of the existing features (GraphQL API views and functions).

Parameterized queries

We want to provide a means to define views and functions at graphql-engine's layer.

  1. A view's definition is a database select query so instead of creating a view, we can just accept the select query which backs the view. For example, we want to allow adding page_items root field as follows:

    root_field_name: page_items
    sql_query: |
      select title, content from articles
      union all
      select title, description from alerts
    

    This should create the same GraphQL API for page_items field as it did when we created the view page_items and tracked it:

    type query_root {
      ...
      page_items(distint_on: .. where: .. limit: .. offset: ..)
      page_items_aggregate(where: .. limit: .. offset: ..)
    }
    type subscription_root {
      ...
      page_items(distint_on: .. where: .. limit: .. offset: ..)
      page_items_aggregate(where: .. limit: .. offset: ..)
    }
    
  2. A function's body may or may not be a simple query, but let's only consider the cases where a function's body is a query such as search_articles. In case of search_articles, we want to allow search_articles root field as follows:

    root_field_name: search_articles
    arguments:
    - $1: search
      type: String
    sql_query: |
      SELECT *
      FROM articles
      WHERE
        title ilike ('%' || $1 || '%')
        OR content ilike ('%' || $1 || '%')
    

    (Note how the search parameter used in search_articles function's body is replaced by $1 in the equivalent parameterized query)

    This should create the same GraphQL API for search_articles field as it did when we created the search_articles function and tracked it.

    type query_root {
      ...
      search_articles(args: {search: ..} where: .. limit: .. offset: ..)
      search_articles_aggregate(args: {search: ..} where: .. limit: .. offset: ..)
    }
    type subscription_root {
      ...
      search_articles(args: {search: ..} where: .. limit: .. offset: ..)
      search_articles_aggregate(args: {search: ..} where: .. limit: .. offset: ..)
    }
    

In short, users can specify 'parameterized' database queries instead of defining functions and views on the database to add custom root fields

Note that the restrictions on function types that are mentioned in the motivation section can be addressed by extending our functions support. We can let our users specify the name of the table that a function returns (currently we get it from the function's type signature on the database) or a custom schema when they don't return an existing table. However, addressing the 'no ddl' requirement is more important and hence we are going forward with parameterized queries.

Product requirements

  1. 'Parameterized queries' should be a superset of our current functions support:

    1. If the body of a function is a database query, one can use a parameterized query (like above).

    2. If the body of a function is procedural and is not a single database query, one can use a parameterized query as follows:

      select * from function($1, ... $n);
      
    3. Support for subscriptions and aggregate fields like we currently have for immutable functions

  2. Should allow mutations too - one should be able to mark a templated query as a query or a mutation so that the field can be added to query root or mutation root appropriately.

  3. Should be a superset of our current views support:

    1. Parameterized queries instead of defining views

      select * from articles where is_verified = true
      
    2. Parameterized queries to use existing views (verified_articles is a view on the database):

      select * from verified_articles;
      
    3. Support for subscriptions and aggregate fields like we currently have on views

  4. Support for relationships

    1. From a parameterized query's response:

      query {
        search_articles(args: {search: "something"}) {
          title
          content
          # author is a relationship from search_articles to author
          author {
            name
          }
        }
      }
      
    2. To a parametrized query:

      query {
        author {
          name
          # verified_articles is a relationship from author table to
          # 'verified_articles' paramaterized query
          verified_articles {
            title
            content
          }
        }
      }
      
  5. Should allow defining queries which do not return tables or views:

    select title, content from article;
    

    In such cases, we expect the user to define the schema of this query's return type:

    title:
      type: text
      nullable: true
    content:
      type: text
      nullable: true
    
  6. Support for permissions:

    1. We want to allow similar permissions like on tables/views.

      1. The columns that can be selected
      2. The rows that can be selected
      3. Whether the _aggregate should be generated
    2. Support for presetting parameters in a query to either static values or values from session variables (like in insert and update permissions).

Product Specification

(See the Discussion section for other approaches considered, the following is based off suggestions made by Joe and Phil on the PR)

The biggest challenge is (5) - letting parameterized queries return table like data whose schema is not captured by any of the existing tables/views that are defined in the database. To address this, we introduce the notion of 'virtual tables' which lets one define the schema of table data at graphql-engine's layer. For example,

version:3
sources:
- name: default
  kind: mssql
  configuration: ..
  tables: ...
  virtual_tables:
  - name: average_rating
    fields:
    - name: num_articles
      type: int
    - name: average_rating
      type: float

defines a virtual table average_rating with fields name and average_rating.

Parameterized queries are defined per source. The return type of a parameterized query can be one or more rows of a database table or a virtual table. For example,

version: 3
sources:
- name: default
  kind: mssql
  configuration: ...
  tables:
  - table: article
  parameterized_queries:
  - name: get_article
    description: Get an article with an id
    query: |
      select * from articles where id = $1
    parameters:
    - param: $1
      name: article_id
      type: uuid
      optional: false
    response:
      return_type:
        table: article
        cardinality: one

defines a parameterized query get_article which returns a single row of type article.

The following example defines a parameterized query author_ratings which returns zero or many rows of the virtual table average_rating.

version: 3
sources:
- name: source_name
  kind: mssql
  configuration: ...
  virtual_tables:
  - name: average_rating
    fields:
    - name: num_articles
      type: int
    - name: average_rating
      type: float
  parameterized_queries:
  - name: author_ratings
    description: Ratings of articles of various authors
    query: |
      select author_id, avg(rating) as average_rating
      from article group_by author_id
    parameters: []
    response:
      return_type:
        virtual_table: average_rating
        cardinality: many

Virtual tables metadata

Virtual tables are defined under a field called virtual_tables under each source.

virtual_tables:
  # name of the virtual table, should be unique in the source
  # this will also be used for the GraphQL type name
- name: name_of_the_virtual_table

  # fields of the virtual table
  fields:
    # name of the field
  - name: f1
    # the database type of the field
    type: database_scalar_type
    # whether a query which returns this type would return nulls for this
    # field
    nullable: true

Parameterized queries metadata

Parameterized queres are defined under a field called parameterized_queries under each source.

version: 3
sources:
- name: default
  kind: postgres
  configuration: ...
  parameterized_queries:
  - ..
  - ..

The schema of each paramaterized query is defined as follows:


# The name of the parameterized query, which will also be the name
# of the field name that exposes this query in GraphQL API

name: author_ratings

# A description of this field, that will be published in the
# GraphQL API

comment: Ratings of articles of various authors

# A parameterized SQL query where parameters start with ':'
# See https://docs.sqlalchemy.org/en/13/core/tutorial.html#using-textual-sql
query: |
  select * from x where c1 = :c1

# additional information about the parameters that are defined in the query
parameters:

  # name of the parameter
- name: c1

  # the type of the parameter: database scalar type
  type: database_scalar_type

  # (optional) whether the parameter is required or not - shows up in the
  # GraphQL schema for the argument. `true` by default
  required: true_or_false

  # (optional) a default value for the parameter if not provided in the query
  default: default_value_for_the_scalar

# return type of the parameterized query
return_type:

  # can be one of virtual_table or table
  table: name_of_the_table
  virtual_table: name_of_the_virtual_table

  # whether the query returns a zero or one row
  # or whether it returns zero or many rows
  # maybe we can come up with a better name than cardinality?
  cardinality: one_or_many

0x777 avatar Aug 26 '21 15:08 0x777

Thanks for writing this up, Vamshi; I think this clarifies a lot of the questions I initially had about this feature!


From the Parameterized Queries section, I think the example snippet needs to be cleaned up to reflect the schema defined further down.

Unless I'm misunderstanding, it should be the following (plus the additional fields that are undefined):

parameterized_queries:
- name: search_articles
  query: |
    SELECT *
    FROM articles
    WHERE
      title ilike ('%' || :search || '%')
      OR content ilike ('%' || :search || '%')
  parameters:
  - name: search
    type: text # Postgres native text scalar type.

Since virtual_tables are defined at the top-level for a particular source, do we expect that there is any interaction with joins on remote relationships (i.e. generalized joins)?

That is: do we want to allow users to join a result from the virtual table returned by a parameterized query against some other value on a different database (e.g. parameterized query on Postgres joined with some column on MSSQL)?


What's the priority order for backends that we want to implement each of virtual_tables and parameterized_queries.

That is: do we want to do Postgres first, then MSSQL, then MySQL, etc.?

jkachmar avatar Aug 26 '21 16:08 jkachmar

Would it be possible to extend this RFC to support the feature in issue #1357 as well? This doesn't look like it will support that case, since views/functions don't fully support that feature (you can't pass a dynamic JSON $where as a parameter to a SQL function).

rccoe avatar Sep 07 '21 14:09 rccoe

@rccoe

you can't pass a dynamic JSON $where as a parameter to a SQL function

This use case will be supported in a future iteration. For example:

parameterized_queries:
- name: author_stats
  query: |
    SELECT a.author_id, count(*) as stat 
    FROM articles AS a
    WHERE :articles_filter
    GROUP BY author_id
  parameters:
  - name: articles_filter
    type: 
      boolean_expression:
        table: articles
        table_identifier: a

0x777 avatar Sep 13 '21 13:09 0x777

That would work!

rccoe avatar Sep 13 '21 14:09 rccoe

@0x777 Could you add returning scalars to the scope in the RFC as well, or perhaps mention it as a TODO?

coco98 avatar Oct 06 '21 15:10 coco98

Any updates?

maxpain avatar Apr 21 '22 10:04 maxpain

It seems like parameterized queries would also be available as "computed fields" that could be added as a "computed column" for a table, is this true @0x777 ?

And if so, there's currently a limitation that computed fields/functions that accept an additional argument cannot be used in the where clause. It seems like parameterized queries would also have this limitation, yes?

#7094 has some additional information about this.

jcarrus avatar Jul 21 '22 23:07 jcarrus

I had no idea this was in the pipeline, and it I'm so excited to see it happening according to the the latest community call schedule!

Question... assuming support for relationships, would these relationships be supported in permissions as well? My assumption is yes (this is what I would want) but just want to clarify.

raphaeltm avatar Dec 01 '22 03:12 raphaeltm

Question... assuming support for relationships, would these relationships be supported in permissions as well? My assumption is yes (this is what I would want) but just want to clarify.

I am still collecting user feedback while dev work is starting on this feature and this is one of the subjects being discussed.

To clarify the behavior you want is -- If a user has access to table A but not table B. If they create a query that pulls data from table B based on a relationship you want to respect permission and return no data because the user doesn't have access to table B? In summary, parameterized queries should treat relationships and permissions in the same manner as the current HGE schema.

dameleney avatar Dec 15 '22 08:12 dameleney

Sorry, I'm not entirely sure what I was asking 😅 I think I just wanted to make sure we could define relationships and permissions in the same way as we would with tables and views currently. i.e. right now we can add manual relationships on tables where there isn't a foreign key - I would hope to be able to do the same thing with these queries.

If they create a query that pulls data from table B based on a relationship you want to respect permission and return no data because the user doesn't have access to table B?

I don't think that's what I was looking for, but if there's a way of doing that, I would find that really interesting! Like if Hasura added a function to the database that could be called for a given request context and role to get the related objects that should be available for the current user? That would be really neat.

raphaeltm avatar Dec 16 '22 23:12 raphaeltm

Does not work for databases (such as SQL Server) that do not have the notion of functions returning named table types.

SQL Server CREATE FUNCTION does have "Inline Table-Valued Function[s]." They don't return a named table, but would it be possible just to treat the function name as the name of a relation that can be tracked like any other?

dventimihasura avatar Jan 18 '23 23:01 dventimihasura

Our users need DDL privileges to create these new functions or views. This privilege is not easily granted

Perhaps it's not granted for a good reason. Are we sure offering effectively a workaround to database access controls is a good idea?

dventimihasura avatar Jan 18 '23 23:01 dventimihasura

Does not work for databases (such as SQL Server) that do not have the notion of functions returning named table types.

SQL Server CREATE FUNCTION does have "Inline Table-Valued Function[s]." They don't return a named table, but would it be possible just to treat the function name as the name of a relation that can be tracked like any other?

Taking a shot in the dark here: is it possible to query the information schema to figure out the return type of a function? Specifically, to figure out what columns exist in the relation and what the types of those relations are?

BenoitRanque avatar Jan 19 '23 04:01 BenoitRanque

is it possible to query the information schema to figure out the return type of a function?

Yes.

There's a built-in stored procedure that does this in sp_describe_first_result_set:

exec sp_describe_first_result_set @ProcName = N'dbo.search_articles(@search)'

There's also a built-in view that does this in sys.dm_exec_describe_first_result_set:

select * from sys.dm_exec_describe_first_result_set(N'select * from dbo.search_articles(@search)')
is_hidden|column_ordinal|name   |is_nullable|system_type_id|system_type_name|max_length|precision|scale|collation_name              |user_type_id|user_type_database|user_type_schema|user_type_name|assembly_qualified_type_name|xml_collection_id|xml_collection_database|xml_collection_schema|xml_collection_name|is_xml_document|is_case_sensitive|is_fixed_length_clr_type|source_server|source_database|source_schema|source_table|source_column|is_identity_column|is_part_of_unique_key|is_updateable|is_computed_column|is_sparse_column_set|ordinal_in_order_by_list|order_by_is_descending|order_by_list_length|tds_type_id|tds_length|tds_collation_id|tds_collation_sort_id|
---------+--------------+-------+-----------+--------------+----------------+----------+---------+-----+----------------------------+------------+------------------+----------------+--------------+----------------------------+-----------------+-----------------------+---------------------+-------------------+---------------+-----------------+------------------------+-------------+---------------+-------------+------------+-------------+------------------+---------------------+-------------+------------------+--------------------+------------------------+----------------------+--------------------+-----------+----------+----------------+---------------------+
        0|             1|id     |          0|            56|int             |         4|       10|    0|                            |            |                  |                |              |                            |                 |                       |                     |                   |              0|                0|                       0|             |               |             |            |             |                 0|                     |            1|                 0|                   0|                        |                      |                    |         56|         4|                |                     |
        0|             2|title  |          1|           167|varchar(100)    |       100|        0|    0|SQL_Latin1_General_CP1_CI_AS|            |                  |                |              |                            |                 |                       |                     |                   |              0|                0|                       0|             |               |             |            |             |                 0|                     |            1|                 0|                   0|                        |                      |                    |        167|       100|        13632521|                   52|
        0|             3|content|          1|           167|varchar(100)    |       100|        0|    0|SQL_Latin1_General_CP1_CI_AS|            |                  |                |              |                            |                 |                       |                     |                   |              0|                0|                       0|             |               |             |            |             |                 0|                     |            1|                 0|                   0|                        |                      |                    |        167|       100|        13632521|                   52|

where search_results is defined like:

CREATE FUNCTION search_articles(@search varchar)
RETURNS table return (
  SELECT *
  FROM articles
  WHERE
    title like ('%' + @search + '%')
    OR content like ('%' + @search + '%')
);

dventimihasura avatar Jan 20 '23 19:01 dventimihasura

This has been implemented as Native Queries and will be available in v2.28, due to be released shortly. It's free for PostgreSQL and available on Cloud and Enterprise for MS SQL Server and BigQuery.

Check out the documentation to learn more!

If you're interested in this and other upcoming features, I recommend you check out HasuraCon 2023, starting in about 5 hours. 😀

SamirTalwar avatar Jun 20 '23 10:06 SamirTalwar