langchain4j icon indicating copy to clipboard operation
langchain4j copied to clipboard

PgVector: support hybrid search

Open hrhrng opened this issue 1 year ago • 11 comments

Issue

Closes # #1599

Change

Impeletement full-text search and hybrid search in dev.langchain4j.store.embedding.pgvector.PgVectorEmbeddingStore

General checklist

  • [X] There are no breaking changes
  • [X] I have added unit and integration tests for my change
  • [X] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green
  • [X] I have manually run all the unit and integration tests in the core and main modules, and they are all green

Checklist for changing existing embedding store integration

  • [X] I have manually verified that the {NameOfIntegration}EmbeddingStore works correctly with the data persisted using the latest released version of LangChain4j

hrhrng avatar Aug 25 '24 17:08 hrhrng

Hi @hrhrng, thanks a lot and sorry for the delay! You were right, I guess it is better to have it as a separate ContentRetriever implementation (the same way Azure AI search is done and the same way Elasticsearch is being implemented (in progress)).

BTW I've noticed there are a few other Postgre extensions for BM25/full-text search (e.g. pg_search, pg_bestmatch.rs, etc). Did you check/compare them? Did you also had a chance to use this implementation in real life?

Thank you!

langchain4j avatar Oct 29 '24 16:10 langchain4j

@langchain4j Sure, I think full-text search can be abstract. I'll implement Gin index(which is default full-text search engine for PgSQL) first. I do run several app on PGVector and Gin index, but the amount of data is not large, so they're just fine.

hrhrng avatar Nov 19 '24 06:11 hrhrng

Hey @hrhrng, how is it going? This is quite an important feature, I hope we could support it soon. Thanks a lot for your help! 🙏

langchain4j avatar Dec 06 '24 12:12 langchain4j

On Dec 6, 2024, at 20:54, LangChain4j @.***> wrote:

Hey @hrhrng https://github.com/hrhrng, how is it going? This is quite an important feature, I hope we could support it soon. Thanks a lot for your help! 🙏

— Reply to this email directly, view it on GitHub https://github.com/langchain4j/langchain4j/pull/1633#issuecomment-2523185612, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANOYK4JGK4KGFK44QQMUFBD2EGM7HAVCNFSM6AAAAABNCWNXDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRTGE4DKNRRGI. You are receiving this because you were mentioned.

@langchain4j Sorry for little busy previously. I’ll work on this feature next week!

hrhrng avatar Dec 06 '24 13:12 hrhrng

@hrhrng great, thank you a lot!

langchain4j avatar Dec 09 '24 14:12 langchain4j

Hi @langchain4j . I have changed the code according to the previous discussion, please take a check if the code meets the expectation.

hrhrng avatar Dec 22 '24 15:12 hrhrng

@hrhrng thanks a lot and sorry for the late reply! I will try to review and merge it ASAP

dliubarskyi avatar Jan 31 '25 15:01 dliubarskyi

@hrhrng BTW did you use this feature already on real data? How do you find it?

dliubarskyi avatar Jan 31 '25 15:01 dliubarskyi

@hrhrng @dliubarskyi any updates on this? It's such a powerful feature that I'd like to start using soon.

sjivan avatar Apr 27 '25 08:04 sjivan

@hrhrng @dliubarskyi any updates on this please? it's a great feature.

naser-yousef avatar Jun 27 '25 12:06 naser-yousef

@hrhrng BTW did you use this feature already on real data? How do you find it?

@dliubarskyi Sorry for the late reply. I’ve been a bit busy recently. Actually, I haven’t tested this feature on real data yet; I only did some initial testing. Maybe you could invite someone else to try it with real data. Also, I’ve resolved the conflict—please have a look. Let me know if you have any other suggestion.

hrhrng avatar Jul 20 '25 15:07 hrhrng