langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Allow use of non-default QDrant vector key

Open mahmoudajawad opened this issue 2 years ago • 3 comments
trafficstars

  • Context #2594
  • This is continuation of #2751 which diverged so much from origin and was better to be re-started.

Fixes #2594

Who can review?

@kacperlukawski

mahmoudajawad avatar Jun 10 '23 04:06 mahmoudajawad

Hello, @kacperlukawski, I'm back on this and I would like to share my thought process:

  • Since last time I looked at this file in code-base, a lot has changed, and so I began with re-implementing the first part I started with.
  • Then, I began on the other part which I didn't implement which is allowing defining multiple vectors at time of injecting documents into QDrant (from_texts class method).
  • Then, I realised defining vector_key at class level prevents langchain users from using different vectors in collection as all queries will now use vector_key from class, so I moved vector_key to similarity_search method.
  • With that, I replaced vector_key from class constructor to vectors_config, and used it to check for its truthy value in similarity_search in order to raise either of two ValueError objects; QDrant class was initialised with vectors_config but value of vector_key wasn't provided, or vector_key is provided but it isn't one of the vectors defined in vectors_config.

With those changes:

  • Users of QDrant vector store, can now re-use their current collections with the current data, even if it was originally created outside of langchain, with any of the vectors collection was defined with.
  • It is possible to create a collection from_documents with multiple vectors now.
  • Original behaviour is preserved.

That said, I'm yet to figure out the effect of this when QDrant class is in chain, as I just made this and tested it as a standalone object, not in a chain.

Am I on a path that is accepted by the project standards?

mahmoudajawad avatar Jun 10 '23 05:06 mahmoudajawad

@kacperlukawski, any chance you had a look at this?

mahmoudajawad avatar Jun 16 '23 03:06 mahmoudajawad

@mahmoudajawad I had an initial look, but I want to cover it with some integration tests. I can hopefully do it over the weekend.

kacperlukawski avatar Jun 16 '23 05:06 kacperlukawski

@mahmoudajawad I tried to cover the PR with some integration tests, but unfortunately, I found some issues hard to solve. I'll start a new PR to bring that functionality.

kacperlukawski avatar Jun 28 '23 09:06 kacperlukawski

@kacperlukawski, I really would like to contribute this. I just need directions since I don't want to just put my code in the codebase without pre-approval of an acceptable approach.

If you could confirm to me that my approach in general is acceptable, I would re-create this PR again, add necessary integration tests, and submit it all again.

mahmoudajawad avatar Jun 28 '23 16:06 mahmoudajawad

@mahmoudajawad I just started a new PR that exposes just a vector name: #6871. Would appreciate your review on that.

kacperlukawski avatar Jun 28 '23 16:06 kacperlukawski

Closing this in favour of https://github.com/hwchase17/langchain/pull/6871

mahmoudajawad avatar Jun 28 '23 16:06 mahmoudajawad