langchainjs icon indicating copy to clipboard operation
langchainjs copied to clipboard

Inspiration: weaviate vectorstore

Open JHeidinga opened this issue 1 year ago • 3 comments

Not really an issue more of inspiration for a weaviate vectorstore:

import { Embeddings } from "langchain/embeddings";
import { VectorStore } from "langchain/vectorstores";
import { IWeaviateClient } from "weaviate-ts-client";
import { Document } from 'langchain/document';
import { uuid } from 'uuidv4';

type WeaviateStoreArgs = {
    client: IWeaviateClient
    indexName: string
    textKey: string
    attributes?: string[]
}

export class WeaviateStore extends VectorStore {
    private client: IWeaviateClient
    private indexName: string
    private textKey: string
    private queryAttrs: string[]

    constructor(public embeddings: Embeddings, args: WeaviateStoreArgs) {
        super(embeddings, args)

        this.client = args.client
        this.indexName = args.indexName
        this.textKey = args.textKey
        this.queryAttrs = [this.textKey]

        if (args.attributes) {
            this.queryAttrs = this.queryAttrs.concat(args.attributes)
        }
    }

    addVectors(vectors: number[][], documents: Document[]): Promise<void> {
        throw new Error("Not Implemented");
    }
    async addDocuments(documents: Document[]): Promise<void> {
        const batch = documents.map(document => ({
            class: this.indexName, 
            id: uuid(), 
            [this.textKey]: document.pageContent,
           ...document.metadata 
        }))

        try {
            await this.client.batch
                .objectsBatcher()
                .withObjects(batch)
                .do()
        } catch (e) {
            throw Error(`'Error in fromDocuments' ${e}`)
        }
    }
    similaritySearchVectorWithScore(query: number[], k: number, filter?: object): Promise<[Document, number][]> {
        throw new Error("Not Implemented");
    }

    async similaritySearch(query: string, k = 4, filter?: Record<string, any> | undefined): Promise<Document[]> {
        const content: {
            concepts: string[],
            certainty?: number
        } = {
            concepts: [query]
        };

        if (filter?.searchDistance) {
            content.certainty = filter.searchDistance;
        }

        try {
            const result = await this.client.graphql
                .get()
                .withClassName(this.indexName)
                .withFields(this.queryAttrs.join(" "))
                .withNearText({ concepts: [query] })
                .withLimit(k)
                .do()

            const documents = [];
            for (const data of result.data.Get[this.indexName]) {
                const record: Record<string, any> = data as any
                const text = record[this.textKey];
                delete record[this.textKey];

                documents.push(new Document({
                    pageContent: text,
                    metadata: record
                }));
            }
            return documents;
        } catch (e) {
            throw Error(`'Error in similaritySearch' ${e}`)
        }
    }

    similaritySearchWithScore(query: string, k?: number, filter?: object | undefined): Promise<[object, number][]> {
        throw Error("Not Implemented");
    }

    static fromTexts(texts: string[], metadatas: object[], embeddings: Embeddings, args: WeaviateStoreArgs): Promise<VectorStore> {

        const docs = texts.map((text, index) => 
            new Document({
                pageContent: text, 
                metadata: metadatas[index]})
        )
        return WeaviateStore.fromDocuments(docs, embeddings, args)
    }

    static async fromDocuments(docs: Document[], embeddings: Embeddings, args: WeaviateStoreArgs): Promise<VectorStore> {
        const instance = new this(embeddings, args);
        await instance.addDocuments(docs);
        return instance;
    }
}

JHeidinga avatar Mar 30 '23 14:03 JHeidinga

That would be amazing! It's really needed.

@JHeidinga Please submit this as a PR, so it's official and maintained.

mysticaltech avatar Apr 10 '23 03:04 mysticaltech

@mysticaltech Here you go: https://github.com/hwchase17/langchainjs/pull/708

JHeidinga avatar Apr 10 '23 11:04 JHeidinga

Wonderful, thank you! 🚀

mysticaltech avatar Apr 11 '23 14:04 mysticaltech

@JHeidinga Hi, does that mean we can use Weaviate without specifying an embedder (vectors)? weaviate/quickstart#option-1-vectorizer

steinathan avatar Jul 24 '23 19:07 steinathan

Hi, @JHeidinga! I'm Dosu, and I'm helping the langchainjs team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you shared an inspiration for a Weaviate vector store and even submitted it as a pull request. Mysticaltech expressed gratitude, indicating that the solution is satisfactory. However, navicstein had a question about using Weaviate without specifying an embedder, which remains unanswered.

Could you please let us know if this issue is still relevant to the latest version of the langchainjs repository? If it is, please comment on the issue to let us know. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution!

dosubot[bot] avatar Oct 23 '23 16:10 dosubot[bot]