graphql-engine
graphql-engine copied to clipboard
rfc (wip): parameterized queries
Parameterized Queries
Motivation
While graphql-engine's auto-generated API is reasonably expressive, it doesn't really compete with the expressivity of a query language that a database provides. You can express a lot more queries through SQL on Postgres, use a lot more operators on Postgres than through graphql-engine's Postgres API.
We want to provide our users an ability to fallback to a databases's query language when they run into cases where graphql-engine's API is not adequate. Currently graphql-engine addresses this by exposing GraphQL API over views and functions. For example, a view can be defined as follows:
create view page_items as
select title, content from articles
union all
select title, description from alerts
and graphql-engine provides this API:
query {
page_items(where: {..} limit: .. offset: ..) {
...
}
}
and similarly for the following function
CREATE FUNCTION search_articles(search text)
RETURNS SETOF articles AS $$
SELECT *
FROM articles
WHERE
title ilike ('%' || search || '%')
OR content ilike ('%' || search || '%')
$$ LANGUAGE sql STABLE;
graphql-engine provides this API:
query {
search_articles(args: {search: "something"} where: .. limit: .. offset: ..) {
...
}
}
While the above approach works, it has a couple of shortcomings:
1. Requirement of DDL privileges
Our users need DDL privileges to create these new functions or views. This privilege is not easily granted, particularly in an enterprise setup. The less we require of a database, the easier it would be for our users to add graphql-engine in their stack.
2. Restrictions on the function return type
The functions that graphql-engine can support currently need to return a 'table' or 'setof table' that's already tracked. There are couple of issues here:
-
Does not work for databases (such as SQL Server) that do not have the notion of functions returning named table types.
-
Does not support functions that do not return a table. For example, if the above
search_articles
function only returns title and content, graphql-engine cannot track it.CREATE FUNCTION search_page_items(search text) RETURNS SETOF TABLE (title text, content text) AS $$ SELECT title, content FROM articles WHERE title ilike ('%' || search || '%') OR content ilike ('%' || search || '%') $$ LANGUAGE sql STABLE;
To work around this, users go through this elaborate process:
- Create a table with the schema of the function return type
- Track the table
- Change the definition of the function to return the newly created table
- Track the function
which is terrible DX
To summarize the motivation is this:
We want our users to use the full query language ability of a database to extend graphql-engine's auto-generated API while addressing the shortcomings of the existing features (GraphQL API views and functions).
Parameterized queries
We want to provide a means to define views and functions at graphql-engine's layer.
-
A view's definition is a database select query so instead of creating a view, we can just accept the select query which backs the view. For example, we want to allow adding
page_items
root field as follows:root_field_name: page_items sql_query: | select title, content from articles union all select title, description from alerts
This should create the same GraphQL API for
page_items
field as it did when we created the viewpage_items
and tracked it:type query_root { ... page_items(distint_on: .. where: .. limit: .. offset: ..) page_items_aggregate(where: .. limit: .. offset: ..) } type subscription_root { ... page_items(distint_on: .. where: .. limit: .. offset: ..) page_items_aggregate(where: .. limit: .. offset: ..) }
-
A function's body may or may not be a simple query, but let's only consider the cases where a function's body is a query such as
search_articles
. In case ofsearch_articles
, we want to allowsearch_articles
root field as follows:root_field_name: search_articles arguments: - $1: search type: String sql_query: | SELECT * FROM articles WHERE title ilike ('%' || $1 || '%') OR content ilike ('%' || $1 || '%')
(Note how the
search
parameter used insearch_articles
function's body is replaced by$1
in the equivalent parameterized query)This should create the same GraphQL API for
search_articles
field as it did when we created thesearch_articles
function and tracked it.type query_root { ... search_articles(args: {search: ..} where: .. limit: .. offset: ..) search_articles_aggregate(args: {search: ..} where: .. limit: .. offset: ..) } type subscription_root { ... search_articles(args: {search: ..} where: .. limit: .. offset: ..) search_articles_aggregate(args: {search: ..} where: .. limit: .. offset: ..) }
In short, users can specify 'parameterized' database queries instead of defining functions and views on the database to add custom root fields
Note that the restrictions on function types that are mentioned in the motivation section can be addressed by extending our functions support. We can let our users specify the name of the table that a function returns (currently we get it from the function's type signature on the database) or a custom schema when they don't return an existing table. However, addressing the 'no ddl' requirement is more important and hence we are going forward with parameterized queries.
Product requirements
-
'Parameterized queries' should be a superset of our current functions support:
-
If the body of a function is a database query, one can use a parameterized query (like above).
-
If the body of a function is procedural and is not a single database query, one can use a parameterized query as follows:
select * from function($1, ... $n);
-
Support for subscriptions and aggregate fields like we currently have for immutable functions
-
-
Should allow mutations too - one should be able to mark a templated query as a query or a mutation so that the field can be added to query root or mutation root appropriately.
-
Should be a superset of our current views support:
-
Parameterized queries instead of defining views
select * from articles where is_verified = true
-
Parameterized queries to use existing views (
verified_articles
is a view on the database):select * from verified_articles;
-
Support for subscriptions and aggregate fields like we currently have on views
-
-
Support for relationships
-
From a parameterized query's response:
query { search_articles(args: {search: "something"}) { title content # author is a relationship from search_articles to author author { name } } }
-
To a parametrized query:
query { author { name # verified_articles is a relationship from author table to # 'verified_articles' paramaterized query verified_articles { title content } } }
-
-
Should allow defining queries which do not return tables or views:
select title, content from article;
In such cases, we expect the user to define the schema of this query's return type:
title: type: text nullable: true content: type: text nullable: true
-
Support for permissions:
-
We want to allow similar permissions like on tables/views.
- The columns that can be selected
- The rows that can be selected
- Whether the
_aggregate
should be generated
-
Support for presetting parameters in a query to either static values or values from session variables (like in insert and update permissions).
-
Product Specification
(See the Discussion section for other approaches considered, the following is based off suggestions made by Joe and Phil on the PR)
The biggest challenge is (5) - letting parameterized queries return table like data whose schema is not captured by any of the existing tables/views that are defined in the database. To address this, we introduce the notion of 'virtual tables' which lets one define the schema of table data at graphql-engine's layer. For example,
version:3
sources:
- name: default
kind: mssql
configuration: ..
tables: ...
virtual_tables:
- name: average_rating
fields:
- name: num_articles
type: int
- name: average_rating
type: float
defines a virtual table average_rating
with fields name
and average_rating
.
Parameterized queries are defined per source. The return type of a parameterized query can be one or more rows of a database table or a virtual table. For example,
version: 3
sources:
- name: default
kind: mssql
configuration: ...
tables:
- table: article
parameterized_queries:
- name: get_article
description: Get an article with an id
query: |
select * from articles where id = $1
parameters:
- param: $1
name: article_id
type: uuid
optional: false
response:
return_type:
table: article
cardinality: one
defines a parameterized query get_article
which returns a single row of type
article
.
The following example defines a parameterized query author_ratings
which
returns zero or many rows of the virtual table average_rating
.
version: 3
sources:
- name: source_name
kind: mssql
configuration: ...
virtual_tables:
- name: average_rating
fields:
- name: num_articles
type: int
- name: average_rating
type: float
parameterized_queries:
- name: author_ratings
description: Ratings of articles of various authors
query: |
select author_id, avg(rating) as average_rating
from article group_by author_id
parameters: []
response:
return_type:
virtual_table: average_rating
cardinality: many
Virtual tables metadata
Virtual tables are defined under a field called virtual_tables
under each
source.
virtual_tables:
# name of the virtual table, should be unique in the source
# this will also be used for the GraphQL type name
- name: name_of_the_virtual_table
# fields of the virtual table
fields:
# name of the field
- name: f1
# the database type of the field
type: database_scalar_type
# whether a query which returns this type would return nulls for this
# field
nullable: true
Parameterized queries metadata
Parameterized queres are defined under a field called parameterized_queries
under each source.
version: 3
sources:
- name: default
kind: postgres
configuration: ...
parameterized_queries:
- ..
- ..
The schema of each paramaterized query is defined as follows:
# The name of the parameterized query, which will also be the name
# of the field name that exposes this query in GraphQL API
name: author_ratings
# A description of this field, that will be published in the
# GraphQL API
comment: Ratings of articles of various authors
# A parameterized SQL query where parameters start with ':'
# See https://docs.sqlalchemy.org/en/13/core/tutorial.html#using-textual-sql
query: |
select * from x where c1 = :c1
# additional information about the parameters that are defined in the query
parameters:
# name of the parameter
- name: c1
# the type of the parameter: database scalar type
type: database_scalar_type
# (optional) whether the parameter is required or not - shows up in the
# GraphQL schema for the argument. `true` by default
required: true_or_false
# (optional) a default value for the parameter if not provided in the query
default: default_value_for_the_scalar
# return type of the parameterized query
return_type:
# can be one of virtual_table or table
table: name_of_the_table
virtual_table: name_of_the_virtual_table
# whether the query returns a zero or one row
# or whether it returns zero or many rows
# maybe we can come up with a better name than cardinality?
cardinality: one_or_many
Thanks for writing this up, Vamshi; I think this clarifies a lot of the questions I initially had about this feature!
From the Parameterized Queries section, I think the example snippet needs to be cleaned up to reflect the schema defined further down.
Unless I'm misunderstanding, it should be the following (plus the additional fields that are undefined):
parameterized_queries:
- name: search_articles
query: |
SELECT *
FROM articles
WHERE
title ilike ('%' || :search || '%')
OR content ilike ('%' || :search || '%')
parameters:
- name: search
type: text # Postgres native text scalar type.
Since virtual_tables
are defined at the top-level for a particular source, do we expect that there is any interaction with joins on remote relationships (i.e. generalized joins)?
That is: do we want to allow users to join a result from the virtual table returned by a parameterized query against some other value on a different database (e.g. parameterized query on Postgres joined with some column on MSSQL)?
What's the priority order for backends that we want to implement each of virtual_tables
and parameterized_queries
.
That is: do we want to do Postgres first, then MSSQL, then MySQL, etc.?
Would it be possible to extend this RFC to support the feature in issue #1357 as well? This doesn't look like it will support that case, since views/functions don't fully support that feature (you can't pass a dynamic JSON $where as a parameter to a SQL function).
@rccoe
you can't pass a dynamic JSON $where as a parameter to a SQL function
This use case will be supported in a future iteration. For example:
parameterized_queries:
- name: author_stats
query: |
SELECT a.author_id, count(*) as stat
FROM articles AS a
WHERE :articles_filter
GROUP BY author_id
parameters:
- name: articles_filter
type:
boolean_expression:
table: articles
table_identifier: a
That would work!
@0x777 Could you add returning scalars to the scope in the RFC as well, or perhaps mention it as a TODO?
Any updates?
It seems like parameterized queries would also be available as "computed fields" that could be added as a "computed column" for a table, is this true @0x777 ?
And if so, there's currently a limitation that computed fields/functions that accept an additional argument cannot be used in the where clause. It seems like parameterized queries would also have this limitation, yes?
#7094 has some additional information about this.
I had no idea this was in the pipeline, and it I'm so excited to see it happening according to the the latest community call schedule!
Question... assuming support for relationships, would these relationships be supported in permissions as well? My assumption is yes (this is what I would want) but just want to clarify.
Question... assuming support for relationships, would these relationships be supported in permissions as well? My assumption is yes (this is what I would want) but just want to clarify.
I am still collecting user feedback while dev work is starting on this feature and this is one of the subjects being discussed.
To clarify the behavior you want is -- If a user has access to table A but not table B. If they create a query that pulls data from table B based on a relationship you want to respect permission and return no data because the user doesn't have access to table B? In summary, parameterized queries should treat relationships and permissions in the same manner as the current HGE schema.
Sorry, I'm not entirely sure what I was asking 😅 I think I just wanted to make sure we could define relationships and permissions in the same way as we would with tables and views currently. i.e. right now we can add manual relationships on tables where there isn't a foreign key - I would hope to be able to do the same thing with these queries.
If they create a query that pulls data from table B based on a relationship you want to respect permission and return no data because the user doesn't have access to table B?
I don't think that's what I was looking for, but if there's a way of doing that, I would find that really interesting! Like if Hasura added a function to the database that could be called for a given request context and role to get the related objects that should be available for the current user? That would be really neat.
Does not work for databases (such as SQL Server) that do not have the notion of functions returning named table types.
SQL Server CREATE FUNCTION does have "Inline Table-Valued Function[s]." They don't return a named table, but would it be possible just to treat the function name as the name of a relation that can be tracked like any other?
Our users need DDL privileges to create these new functions or views. This privilege is not easily granted
Perhaps it's not granted for a good reason. Are we sure offering effectively a workaround to database access controls is a good idea?
Does not work for databases (such as SQL Server) that do not have the notion of functions returning named table types.
SQL Server CREATE FUNCTION does have "Inline Table-Valued Function[s]." They don't return a named table, but would it be possible just to treat the function name as the name of a relation that can be tracked like any other?
Taking a shot in the dark here: is it possible to query the information schema to figure out the return type of a function? Specifically, to figure out what columns exist in the relation and what the types of those relations are?
is it possible to query the information schema to figure out the return type of a function?
Yes.
There's a built-in stored procedure that does this in sp_describe_first_result_set
:
exec sp_describe_first_result_set @ProcName = N'dbo.search_articles(@search)'
There's also a built-in view that does this in sys.dm_exec_describe_first_result_set
:
select * from sys.dm_exec_describe_first_result_set(N'select * from dbo.search_articles(@search)')
is_hidden|column_ordinal|name |is_nullable|system_type_id|system_type_name|max_length|precision|scale|collation_name |user_type_id|user_type_database|user_type_schema|user_type_name|assembly_qualified_type_name|xml_collection_id|xml_collection_database|xml_collection_schema|xml_collection_name|is_xml_document|is_case_sensitive|is_fixed_length_clr_type|source_server|source_database|source_schema|source_table|source_column|is_identity_column|is_part_of_unique_key|is_updateable|is_computed_column|is_sparse_column_set|ordinal_in_order_by_list|order_by_is_descending|order_by_list_length|tds_type_id|tds_length|tds_collation_id|tds_collation_sort_id|
---------+--------------+-------+-----------+--------------+----------------+----------+---------+-----+----------------------------+------------+------------------+----------------+--------------+----------------------------+-----------------+-----------------------+---------------------+-------------------+---------------+-----------------+------------------------+-------------+---------------+-------------+------------+-------------+------------------+---------------------+-------------+------------------+--------------------+------------------------+----------------------+--------------------+-----------+----------+----------------+---------------------+
0| 1|id | 0| 56|int | 4| 10| 0| | | | | | | | | | | 0| 0| 0| | | | | | 0| | 1| 0| 0| | | | 56| 4| | |
0| 2|title | 1| 167|varchar(100) | 100| 0| 0|SQL_Latin1_General_CP1_CI_AS| | | | | | | | | | 0| 0| 0| | | | | | 0| | 1| 0| 0| | | | 167| 100| 13632521| 52|
0| 3|content| 1| 167|varchar(100) | 100| 0| 0|SQL_Latin1_General_CP1_CI_AS| | | | | | | | | | 0| 0| 0| | | | | | 0| | 1| 0| 0| | | | 167| 100| 13632521| 52|
where search_results
is defined like:
CREATE FUNCTION search_articles(@search varchar)
RETURNS table return (
SELECT *
FROM articles
WHERE
title like ('%' + @search + '%')
OR content like ('%' + @search + '%')
);
This has been implemented as Native Queries and will be available in v2.28, due to be released shortly. It's free for PostgreSQL and available on Cloud and Enterprise for MS SQL Server and BigQuery.
Check out the documentation to learn more!
If you're interested in this and other upcoming features, I recommend you check out HasuraCon 2023, starting in about 5 hours. 😀