tsvector2 icon indicating copy to clipboard operation
tsvector2 copied to clipboard

Status of this project

Open bf opened this issue 3 years ago • 4 comments
trafficstars

Hello, thanks for your hard work on this feature. I have some questions.

  • What is new size limitation of tsvector2, as the 1MB is no longer relevant?
  • I see you replaced GIN index with RUM index - is RUM index needed for tsvector2?
  • Did you try to get this merged with the main project?
  • As PostgreSQL is on version 14 now, what is overall status on this project? Did you find a better solution?

Thank you very much!

bf avatar Jul 13 '22 16:07 bf

Hi,

  1. 1MB limit is not relevant for this type, limitation is same as in other toasted types.
  2. RUM is optional. If the extensions finds RUM when installed, it'll add support functions for it.
  3. There's was a patch to postgres, but it didn't get through.
  4. I'm not working on this project anymore. But maybe I'll add support of newer postgres versions some day.

ildus avatar Aug 04 '22 10:08 ildus

Hi Ildus, thanks for your elaborate answers to these questions.

I have now "circumvented" 1MB tsvector limitation in PG14 by using array_to_tsvector(string_to_array(_clean_text(text_content), ' ')) where _clean_text is a function that removes special characters. array_to_tsvector returns tsvector without the positional information, and this is what gets me under the 1MB threshold.

Overall, it is still an extremely ugly hack which I don't like. But I also don't understand why there isn't something like bigtsvector with optional stemming so the use case of exact search is also covered in postgresql.

bf avatar Aug 04 '22 10:08 bf

Yes, I think that is expected feature. I was trying to replace current tsvector type in the past, but maybe new type like bigtsvector based on this extension or at least as a contrib extension it could be accepted by committers.

ildus avatar Aug 05 '22 10:08 ildus

There's was a patch to postgres, but it didn't get through

For reference the patch was Remove 1MB size limit for tsvector but it was returned for feedback.

A new datatype bigtsvector was also mentioned in thread tsvector field length limitation.

mguinness avatar Nov 02 '23 23:11 mguinness