langchain
langchain copied to clipboard
Implementation for Matching Engine Vectorstore
We just finished the implementation for the vector store using the GCP Matching Engine.
We'll be contributing the implementation.
Related to #2892
If you have any questions or suggestions please contact me (@tomaspiaggio) or @scafati98.
I just pushed a new updates addressing the comments. However, we were trying to add google-cloud-storage and google-cloud-aiplatform to the pyproject.toml but we're having dependency conflicts with black. Do you have any suggetions here? @dev2049
I just pushed a new updates addressing the comments. However, we were trying to add
google-cloud-storageandgoogle-cloud-aiplatformto thepyproject.tomlbut we're having dependency conflicts withblack. Do you have any suggetions here? @dev2049
black is only a linting dependency, not a package dependency, so shouldn't cause issues. think you may have accidentally added it to list of actual dependencies
@hwchase17 I thought that was addressed with the from_components function. Would you comment specifically what would you need? I'm also not sure what you mean by arguments being passed around as well. Would you please comment on that as well so I can fix it? Thank you!
@hwchase17 I thought that was addressed with the
from_componentsfunction. Would you comment specifically what would you need? I'm also not sure what you mean by arguments being passed around as well. Would you please comment on that as well so I can fix it? Thank you!
think he means to make __init__ look something like what i mentioned here https://github.com/hwchase17/langchain/pull/3104/files#r1170476602
@dev2049 I already added the from_components function and I agree it is a better approach. The methods called in the constructor are validations for the gcs_bucket_name and that the client libraries are installed. I'm sorry if I'm not understanding what you mean.
@dev2049 I already added the
from_componentsfunction and I agree it is a better approach. The methods called in the constructor are validations for thegcs_bucket_nameand that the client libraries are installed. I'm sorry if I'm not understanding what you mean.
i just meant you should update __init__ params, which it looks like you did in https://github.com/hwchase17/langchain/pull/3104/commits/2f946f548d502958b97143719e1b36da6f01b05a 🙏 !
Great @dev2049 !! So do you need me to do anything else for the merge?
@hwchase17 any chance to get this into release anytime soon?
@hwchase17 Same question here: Would be nice to see this released
One concern is that the docs are stored/retrieve from GCS which is slow (and somewhat defeats the purpose of using a Vector DB)
@tomaspiaggio should you create a PR your branch to master?
@hwchase17 Any updates on this one? Would be a cool feature!
Will this be merged to master? @hwchase17
Keen to get this merged into master @hwchase17
Once we have Matching engine index is deployed, What is the best retriever on langchain to get the query results ? @tomaspiaggio
Have been using the Vector Search (Matching Engine) with langchain for a couple of days now and I've been hitting my head against a wall to solve a problem.
I notice that when embeddings are sent to Vector Search they get stored and a file is also created and stored within a separate GCS bucket that is referenced when queried.
I am looking for a way to remove the embeddings from the Vector Search but it seems I can only do it with gcloud commands but I need to know the datapoint_ids.
What would be the best way to store the datapoint_ids that are related to the documents that are being embedded?