Astra DB - collection bug
Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain.js documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain.js rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
import {
AstraDBVectorStore,
AstraLibArgs,
} from '@langchain/community/vectorstores/astradb'
import { formatDocumentsAsString } from 'langchain/util/document'
import { GoogleVertexAIEmbeddings } from '@langchain/community/embeddings/googlevertexai'
const VertexAIEmbeddings = new GoogleVertexAIEmbeddings()
const getAstraDBRetriever = async () => {
const astraConfig: AstraLibArgs = {
token: '{{token}}',
endpoint: '{{endpoint}}',
collection: '{{collection}}',
}
const store = await AstraDBVectorStore.fromExistingIndex(VertexAIEmbeddings, astraConfig)
return store.asRetriever({
k: topK,
searchType: 'similarity',
metadata,
}).pipe(formatDocumentsAsString)
}
Error Message and Stack Trace (if applicable)
"Collection 'default_keyspace.my_testing_collection' already exists CollectionAlreadyExistsError: Collection 'default_keyspace.my_testing_collection' already exists\n at Db.createCollection (/Users/dev/github/web/common/temp/node_modules/.pnpm/@[email protected]/node_modules/@datastax/astra-db-ts/dist/data-api/db.js:343:23)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async AstraDBVectorStore.initialize (/Users/dev/github/web/common/temp/node_modules/.pnpm/@[email protected]_@[email protected][email protected]/node_modules/@langchain/community/dist/vectorstores/astradb.cjs:105:9)\n at async AstraDBVectorStore.fromExistingIndex (/Users/dev/github/web/common/temp/node_modules/.pnpm/@[email protected]_@[email protected][email protected]/node_modules/@langchain/community/dist/vectorstores/astradb.cjs:271:9)"
Description
Trying to use Astra as the vector store. But it can not be initialize.
The bug is coming from here: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/astradb.ts#L94
When the collectionName is already existing, instead of going to next line to connect to the collection, it will throw an error, and stop the connection process.
System Info
"@langchain/community": "~0.0.49",
"@datastax/astra-db-ts": "~1.0.1",
To address the CollectionAlreadyExistsError when initializing Astra as the vector store in LangChain.js, you'll need to modify the initialize method in the AstraDBVectorStore class. The modification involves adding a check to see if the collection already exists before attempting to create it. If the collection exists, the method should connect to it instead of trying to create a new one. This can be done by catching the error that indicates the collection already exists and then connecting to the existing collection.
Here's a suggested modification for the initialize method:
async initialize(): Promise<void> {
try {
await this.astraDBClient.createCollection(
this.collectionName,
this.collectionOptions
);
} catch (error) {
if (error instanceof CollectionAlreadyExistsError) {
console.debug("Collection already exists, connecting to it.");
} else {
throw error; // Rethrow if it's not the specific error we're looking for
}
}
this.collection = await this.astraDBClient.collection(this.collectionName);
console.debug("Connected to Astra DB collection");
}
Ensure that CollectionAlreadyExistsError is correctly identified based on the actual error handling provided by @datastax/astra-db-ts. Adjust the catch block accordingly if the error class or mechanism differs.
This approach should help you bypass the error and connect to an existing collection without attempting to recreate it.
Sources
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Hey @jinchi2013
you can by-pass the collection check by setting checkExists: false
const astraConfig: AstraLibArgs = {
token: '{{token}}',
endpoint: '{{endpoint}}',
collection: '{{collection}}',
collectionOptions: {
checkExists: false
}
}
Also note that you're creating the collection without specifying the vector options so it won't be able to store embeddings.
You can see the collectionOptions and a full example in the LangChainJS documentation for AstraDB
@nicoloboschi Thank you. I don't want create a new collection. I want to use the existing collection. I tested the checkExists: false, and I'm getting a new error now.
Invalid collection name: provided collection ({{ collection }}) already exists with different collection options DataAPIResponseError: Invalid collection name: provided collection ({{ collection }}) already exists with different collection options
@nicoloboschi Thank you. I don't want create a new collection. I want to use the existing collection. I tested the
checkExists: false, and I'm getting a new error now.Invalid collection name: provided collection ({{ collection }}) already exists with different collection options DataAPIResponseError: Invalid collection name: provided collection ({{ collection }}) already exists with different collection options
@jinchi2013 it's likely that you changed the collection options since the first time you created the collection. Since you're using the VertexAIEmbeddings you need to set the dimension on the collectionOptions and to enable the vector column. To do that, you need to change the code in this way:
const astraConfig: AstraLibArgs = {
token: '{{token}}',
endpoint: '{{endpoint}}',
collection: '{{collection}}',
collectionOptions: {
checkExists: false,
vector: {
dimension: 768, // this is the n. of textembedding-gecko dimensions
metric: "cosine",
},
}
}
I'd suggest you to start over (delete the table from the UI) and run the code again.
@nicoloboschi I didn't create this collection. The collection is already there and it is using to other purpose also. I don't think delete and recreate is the option here. There is no way to me access a existing collection?
@jinchi2013 it looks like you need to match up the embeddings model that was used to create the vector store and then the collection options will match. Try setting the embeddings model specifically like this:
VertexAIEmbeddings(model_name="textembedding-gecko")
@CharnaParkey VertexAI use textembedding-gecko as its default option. See here: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/embeddings/googlevertexai.ts#L76
@jinchi2013 The collection has been created with the wrong configuration and cannot be used for vertexAIEmbeddings. My suggestion is to ask to your colleague that created the table to use the above collectionOptions.
They can use whatever method is supported but the collectionOptions have to be the same I posted in the comment.
After that, you can safely run the code suggested
I can use Python version of langchain to access the same collection. What makes js version so special?
There is not issue with below Python code
embeddings = VertexAIEmbeddings(model_name="textembedding-gecko")
vector_store = AstraDBVectorStore(
token=os.getenv("TOKEN"),
api_endpoint=os.getenv("ENDPOINT"),
collection_name="{{ collection_name }}",
embedding=embeddings
)
vector_store.similarity_search("search something", k=3)
This is now resolved by https://github.com/langchain-ai/langchainjs/pull/5142 https://github.com/langchain-ai/langchainjs/pull/5170 https://github.com/langchain-ai/langchainjs/pull/5185