postgrest
postgrest copied to clipboard
Change FTS operators to explicitly call to_tsvector()
Environment
- PostgreSQL version: PostgreSQL 13.6 (Ubuntu 13.6-0ubuntu0.21.10.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, 64-bit
- PostgREST version: postgrest/postgrest:v9.0.0
- Operating system: Ubuntu 21.10
Description of issue
I cannot perform a full-text search on json or jsonb column types.
Steps to reproduce
- Perform a query such as
GET /contacts?select=jsonb_field&jsonb_field_1=fts.jsonvalue
Actual result
{
"hint": "No operator matches the given name and argument types. You might need to add explicit type casts.",
"details": null,
"code": "42883",
"message": "operator does not exist: jsonb @@ tsquery"
}
The query produced contains the SQL:
WHERE "public"."contacts"."jsonb_field" @@ to_tsquery($1)
Expected Result
I'm able to perform a full-text search on a json/jsonb column. There is no workaround with PostgREST at present.
Possible solution
Instead of generating:
WHERE "public"."contacts"."jsonb_field" @@ to_tsquery($1)
...generate:
WHERE to_tsvector("public"."contacts"."jsonb_field") @@ to_tsquery($1)
I believe this is to be exactly equivalent. According to the doc, to_tsvector()
is implicitly called with text:
text @@ tsquery → boolean
Does text string, after implicit invocation of to_tsvector(), match tsquery?
By explicitly calling to_tsvector()
, json/jsonb columns would also be handled.
According to the doc, to_tsvector() is implicitly called with text:
@dsyrstad Hm, if the above is true, then I believe just doing a ->>
on a field of the jsonb should work?
Like:
GET /contacts?select=jsonb_field&jsonb_field_1->>field=fts.jsonvalue
Or do you want to search across all the jsonb
value. Hm, I do see that's supported by tsvector
:
postgres=# \df to_tsvector
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+-------------+------------------+---------------------+------
pg_catalog | to_tsvector | tsvector | json | func
pg_catalog | to_tsvector | tsvector | jsonb | func
pg_catalog | to_tsvector | tsvector | regconfig, json | func
pg_catalog | to_tsvector | tsvector | regconfig, jsonb | func
pg_catalog | to_tsvector | tsvector | regconfig, text | func
pg_catalog | to_tsvector | tsvector | text | func
We've discusssed adding modifiers to operators in https://github.com/PostgREST/postgrest/issues/1943#issuecomment-985252446. So perhaps we can have:
GET /contacts?jsonb_field_1=to_tsvector.fts.jsonvalue
@steve-chavez Yes, I want to search across all jsonb values. It's pretty convenient.
Having a to_tsvector
modifier would work. But the solution I proposed should be exactly equivalent, plus expands the functionality without have to specify anything more in the query.
Curious - could we also get casts with modifiers? You can currently cast in select
, but not in filter operators.
But the solution I proposed should be exactly equivalent, plus expands the functionality without have to specify anything more in the query.
Ah, I see. So just call to_tsvector
for all. We'd have to test if that breaks any tests I guess.
Would you like to give it a shot? Relevant file is https://github.com/PostgREST/postgrest/blob/main/src/PostgREST/Query/SqlFragment.hs#L99
Curious - could we also get casts with modifiers?
Hm, we frown upon doing that because any client could invalidate indexes and generate slow queries.
@steve-chavez Normally I would raise a PR, but in this case I don't have Haskell skills, nor an environment set up to build it on.
Kind of similar to what we did in #2145 - making the calls explicit, opens up new possibilities.
I tested adding to_tsvector
to this line:
https://github.com/PostgREST/postgrest/blob/115dae748463287ec4fff7da1cedc982411450d0/src/PostgREST/Query/SqlFragment.hs#L258
Result:
pgFmtFieldFts op = "to_tsvector(" <> pgFmtField table fld <> ")" <> " " <> SQL.sql (ftsOperator op)
This breaks all the fts
tests where the column type is tsvector
. For instance, one of the failing tests returns this filter in the query (text_search_vector
is a tsvector
column), which throws an error:
WHERE to_tsvector("api"."tsearch"."text_search_vector") @@ to_tsquery($1)
Can't find a way to determine the type of the column, maybe the schema cache needs to be used here, or perhaps adding the to_tsvector
modifier to the fts
operator like Steve mentioned is needed?
Can't find a way to determine the type of the column, maybe the schema cache needs to be used here
Now that tableColumns is easier to get (already inside our Table in the internal cache) I think we could do that.
Only need to be careful to not search for the column for every other filter - we could also use a load test to see if noticeable perf is lost for fts
.