openai-cookbook
openai-cookbook copied to clipboard
Initial commit of vector database example with new embeddings
This PR contains a notebook running through an example of using our embeddings to embed Simple Wikipedia and then indexing and searching it in both Weaviate and Pinecone.
Hi, would you mind adding Qdrant as another option? I can provide a working example similar to the created ones.
Qdrant https://github.com/qdrant/qdrant is a high-performant vector search database written in Rust. The fastest open-source solution available a the moment according to benchmarks. https://qdrant.tech/benchmarks/
Couple more suggestions:
- Add installation instructions for pinecone, weaviate, and qdrant_client, as most people won't have them. I think it's especially valuable here since the package names and import names aren't the same. Looks like pinecone-client and weaviate-client and qdrant-client, vs pinecone, weaviate, and qdrant_client.
- Add qdrant to the table of contents/outline at the top
- After running
pip install --upgrade pinecone-clientI immediately run into an error when importing it. Not sure why, but I want to figure it out before we merge
^on the error I'm hitting, I've emailed pinecone support and will wait to hear back.
One last suggestion: I think it would be helpful if you precomputed the embeddings, stored them on our CDN, and let people download them, so that they don't have to pay $5 each time they run the example. This is what we've done in some of the other examples. Feel free to pick any file format you like. DM me to discuss how to upload to our CDN and what URL we'll want to give it.
Then, if everything runs on your end, we can merge. Thanks again for all the work on this!
One last suggestion: I think it would be helpful if you precomputed the embeddings, stored them on our CDN, and let people download them, so that they don't have to pay $5 each time they run the example. This is what we've done in some of the other examples. Feel free to pick any file format you like. DM me to discuss how to upload to our CDN and what URL we'll want to give it.
Then, if everything runs on your end, we can merge. Thanks again for all the work on this!
@ted-at-openai these are now resolved