langchain Implementation for Matching Engine Vectorstore

We just finished the implementation for the vector store using the GCP Matching Engine.

We'll be contributing the implementation.

Related to #2892

If you have any questions or suggestions please contact me (@tomaspiaggio) or @scafati98.

Apr 18 '23 18:04 tomaspiaggio

I just pushed a new updates addressing the comments. However, we were trying to add google-cloud-storage and google-cloud-aiplatform to the pyproject.toml but we're having dependency conflicts with black. Do you have any suggetions here? @dev2049

Apr 19 '23 15:04 tomaspiaggio

I just pushed a new updates addressing the comments. However, we were trying to add google-cloud-storage and google-cloud-aiplatform to the pyproject.toml but we're having dependency conflicts with black. Do you have any suggetions here? @dev2049

black is only a linting dependency, not a package dependency, so shouldn't cause issues. think you may have accidentally added it to list of actual dependencies

Apr 19 '23 16:04 dev2049

@hwchase17 I thought that was addressed with the from_components function. Would you comment specifically what would you need? I'm also not sure what you mean by arguments being passed around as well. Would you please comment on that as well so I can fix it? Thank you!

Apr 20 '23 13:04 tomaspiaggio

@hwchase17 I thought that was addressed with the from_components function. Would you comment specifically what would you need? I'm also not sure what you mean by arguments being passed around as well. Would you please comment on that as well so I can fix it? Thank you!

think he means to make __init__ look something like what i mentioned here https://github.com/hwchase17/langchain/pull/3104/files#r1170476602

Apr 20 '23 17:04 dev2049

@dev2049 I already added the from_components function and I agree it is a better approach. The methods called in the constructor are validations for the gcs_bucket_name and that the client libraries are installed. I'm sorry if I'm not understanding what you mean.

Apr 20 '23 17:04 tomaspiaggio

@dev2049 I already added the from_components function and I agree it is a better approach. The methods called in the constructor are validations for the gcs_bucket_name and that the client libraries are installed. I'm sorry if I'm not understanding what you mean.

i just meant you should update __init__ params, which it looks like you did in https://github.com/hwchase17/langchain/pull/3104/commits/2f946f548d502958b97143719e1b36da6f01b05a 🙏 !

Apr 21 '23 01:04 dev2049

Great @dev2049 !! So do you need me to do anything else for the merge?

Apr 22 '23 01:04 tomaspiaggio

@hwchase17 any chance to get this into release anytime soon?

May 07 '23 13:05 meal

@hwchase17 Same question here: Would be nice to see this released

May 21 '23 18:05 eugenemiretsky

One concern is that the docs are stored/retrieve from GCS which is slow (and somewhat defeats the purpose of using a Vector DB)

May 21 '23 18:05 eugenemiretsky

@tomaspiaggio should you create a PR your branch to master?

May 24 '23 11:05 eugenemiretsky

@hwchase17 Any updates on this one? Would be a cool feature!

May 25 '23 19:05 olaf-hoops

Will this be merged to master? @hwchase17

May 30 '23 15:05 tomaspiaggio

Keen to get this merged into master @hwchase17

May 31 '23 03:05 HarrisonKhannah

Once we have Matching engine index is deployed, What is the best retriever on langchain to get the query results ? @tomaspiaggio

Jun 01 '23 10:06 ramssai

Have been using the Vector Search (Matching Engine) with langchain for a couple of days now and I've been hitting my head against a wall to solve a problem.

I notice that when embeddings are sent to Vector Search they get stored and a file is also created and stored within a separate GCS bucket that is referenced when queried.

I am looking for a way to remove the embeddings from the Vector Search but it seems I can only do it with gcloud commands but I need to know the datapoint_ids.

What would be the best way to store the datapoint_ids that are related to the documents that are being embedded?

Nov 30 '23 23:11 ktibbs9417

langchain langchain copied to clipboard

Implementation for Matching Engine Vectorstore

langchain
langchain copied to clipboard