tsvector2
tsvector2 copied to clipboard
Status of this project
Hello, thanks for your hard work on this feature. I have some questions.
- What is new size limitation of tsvector2, as the 1MB is no longer relevant?
- I see you replaced GIN index with RUM index - is RUM index needed for tsvector2?
- Did you try to get this merged with the main project?
- As PostgreSQL is on version 14 now, what is overall status on this project? Did you find a better solution?
Thank you very much!
Hi,
- 1MB limit is not relevant for this type, limitation is same as in other toasted types.
- RUM is optional. If the extensions finds RUM when installed, it'll add support functions for it.
- There's was a patch to postgres, but it didn't get through.
- I'm not working on this project anymore. But maybe I'll add support of newer postgres versions some day.
Hi Ildus, thanks for your elaborate answers to these questions.
I have now "circumvented" 1MB tsvector limitation in PG14 by using array_to_tsvector(string_to_array(_clean_text(text_content), ' ')) where _clean_text is a function that removes special characters. array_to_tsvector returns tsvector without the positional information, and this is what gets me under the 1MB threshold.
Overall, it is still an extremely ugly hack which I don't like. But I also don't understand why there isn't something like bigtsvector with optional stemming so the use case of exact search is also covered in postgresql.
Yes, I think that is expected feature. I was trying to replace current tsvector type in the past, but maybe new type like bigtsvector based on this extension or at least as a contrib extension it could be accepted by committers.
There's was a patch to postgres, but it didn't get through
For reference the patch was Remove 1MB size limit for tsvector but it was returned for feedback.
A new datatype bigtsvector was also mentioned in thread tsvector field length limitation.