postgrest icon indicating copy to clipboard operation
postgrest copied to clipboard

Change FTS operators to explicitly call to_tsvector()

Open dsyrstad opened this issue 2 years ago • 11 comments

Environment

  • PostgreSQL version: PostgreSQL 13.6 (Ubuntu 13.6-0ubuntu0.21.10.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, 64-bit
  • PostgREST version: postgrest/postgrest:v9.0.0
  • Operating system: Ubuntu 21.10

Description of issue

I cannot perform a full-text search on json or jsonb column types.

Steps to reproduce

  1. Perform a query such as GET /contacts?select=jsonb_field&jsonb_field_1=fts.jsonvalue

Actual result

{
    "hint": "No operator matches the given name and argument types. You might need to add explicit type casts.",
    "details": null,
    "code": "42883",
    "message": "operator does not exist: jsonb @@ tsquery"
}

The query produced contains the SQL:

WHERE  "public"."contacts"."jsonb_field" @@ to_tsquery($1) 

Expected Result

I'm able to perform a full-text search on a json/jsonb column. There is no workaround with PostgREST at present.

Possible solution

Instead of generating:

WHERE  "public"."contacts"."jsonb_field" @@ to_tsquery($1) 

...generate:

WHERE  to_tsvector("public"."contacts"."jsonb_field") @@ to_tsquery($1) 

I believe this is to be exactly equivalent. According to the doc, to_tsvector() is implicitly called with text:

text @@ tsquery → boolean

Does text string, after implicit invocation of to_tsvector(), match tsquery?

By explicitly calling to_tsvector(), json/jsonb columns would also be handled.

dsyrstad avatar Apr 20 '22 19:04 dsyrstad

According to the doc, to_tsvector() is implicitly called with text:

@dsyrstad Hm, if the above is true, then I believe just doing a ->> on a field of the jsonb should work?

Like:

GET /contacts?select=jsonb_field&jsonb_field_1->>field=fts.jsonvalue

steve-chavez avatar Apr 20 '22 20:04 steve-chavez

Or do you want to search across all the jsonb value. Hm, I do see that's supported by tsvector:

postgres=# \df to_tsvector
                            List of functions
   Schema   |    Name     | Result data type | Argument data types | Type
------------+-------------+------------------+---------------------+------
 pg_catalog | to_tsvector | tsvector         | json                | func
 pg_catalog | to_tsvector | tsvector         | jsonb               | func
 pg_catalog | to_tsvector | tsvector         | regconfig, json     | func
 pg_catalog | to_tsvector | tsvector         | regconfig, jsonb    | func
 pg_catalog | to_tsvector | tsvector         | regconfig, text     | func
 pg_catalog | to_tsvector | tsvector         | text                | func

steve-chavez avatar Apr 20 '22 20:04 steve-chavez

We've discusssed adding modifiers to operators in https://github.com/PostgREST/postgrest/issues/1943#issuecomment-985252446. So perhaps we can have:

GET /contacts?jsonb_field_1=to_tsvector.fts.jsonvalue

steve-chavez avatar Apr 20 '22 20:04 steve-chavez

@steve-chavez Yes, I want to search across all jsonb values. It's pretty convenient.

Having a to_tsvector modifier would work. But the solution I proposed should be exactly equivalent, plus expands the functionality without have to specify anything more in the query.

Curious - could we also get casts with modifiers? You can currently cast in select, but not in filter operators.

dsyrstad avatar Apr 20 '22 20:04 dsyrstad

But the solution I proposed should be exactly equivalent, plus expands the functionality without have to specify anything more in the query.

Ah, I see. So just call to_tsvector for all. We'd have to test if that breaks any tests I guess.

Would you like to give it a shot? Relevant file is https://github.com/PostgREST/postgrest/blob/main/src/PostgREST/Query/SqlFragment.hs#L99

Curious - could we also get casts with modifiers?

Hm, we frown upon doing that because any client could invalidate indexes and generate slow queries.

steve-chavez avatar Apr 20 '22 20:04 steve-chavez

@steve-chavez Normally I would raise a PR, but in this case I don't have Haskell skills, nor an environment set up to build it on.

dsyrstad avatar Apr 20 '22 21:04 dsyrstad

Kind of similar to what we did in #2145 - making the calls explicit, opens up new possibilities.

wolfgangwalther avatar Apr 21 '22 10:04 wolfgangwalther

I tested adding to_tsvector to this line:

https://github.com/PostgREST/postgrest/blob/115dae748463287ec4fff7da1cedc982411450d0/src/PostgREST/Query/SqlFragment.hs#L258

Result:

pgFmtFieldFts op = "to_tsvector(" <> pgFmtField table fld <> ")" <> " " <> SQL.sql (ftsOperator op)

This breaks all the fts tests where the column type is tsvector. For instance, one of the failing tests returns this filter in the query (text_search_vector is a tsvector column), which throws an error:

  WHERE  to_tsvector("api"."tsearch"."text_search_vector") @@ to_tsquery($1) 

Can't find a way to determine the type of the column, maybe the schema cache needs to be used here, or perhaps adding the to_tsvector modifier to the fts operator like Steve mentioned is needed?

laurenceisla avatar Apr 26 '22 18:04 laurenceisla

Can't find a way to determine the type of the column, maybe the schema cache needs to be used here

Now that tableColumns is easier to get (already inside our Table in the internal cache) I think we could do that.

Only need to be careful to not search for the column for every other filter - we could also use a load test to see if noticeable perf is lost for fts.

steve-chavez avatar Apr 27 '22 16:04 steve-chavez