llama_index Example using an existing Pinecone index & namespace

Hey, is it possible to use an existing Pinecone index? In this example, you create a new one & index it

https://github.com/jerryjliu/gpt_index/blob/main/examples/vector_indices/PineconeIndexDemo.ipynb

Also, not clear if we can use specific namespaces in Pinecone?

Thanks for the great lib anyway 🎸

Jan 20 '23 15:01 louis030195

You could try using the PineconeReader (not the Pinecone Index) to load docs from an existing Pinecone index (https://gpt-index.readthedocs.io/en/latest/how_to/vector_stores.html), and then feed those into a GPT index (e.g. a GPTSimpleVectorIndex or GPTListIndex).

The PineconeReader isn't perfect though, let me know your feedback on that

Jan 22 '23 00:01 jerryjliu

@louis030195 did you have a chance to try this out?

Jan 22 '23 19:01 jerryjliu

Side question: I'd like to insert documents one by one to my Pinecone index, but all I see in the examples is SimpleDirectoryReader(...).load_data(). But my source data is not a directory of text files, it's a string that comes from a web (POST) request.

Here's my current code (simplified):

pinecone.init(api_key="...", environment="...")
pinecone_index = pinecone.Index("...")
index = GPTPineconeIndex([], pinecone_index=pinecone_index)

newtext = """
removed to keep the code short
"""

index.insert(Document(text=newtext))

Is what I'm trying to do even possible? Should I use Pinecone directly to create/update the index, and use gpt_index for querying exclusively?

It may be obvious but I'm very new to all of this, sorry if it sounds dumb.

Jan 25 '23 10:01 bouiboui

See, I can make it work like this:

pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENV)
index = GPTPineconeIndex(documents=[], pinecone_index=pinecone.Index('...'))

index.load_from_disk('../my_index.json')

index.insert(document=Document("document contents"))

index.save_to_disk('../my_index.json')

response = index.query("My question ?", verbose=True)

But I'm relying on a JSON file, the whole point of using Pinecone for me was to keep the data somewhere else, without touching the local filesystem. Is that possible?

Jan 25 '23 16:01 bouiboui

from gpt_index import Document
docs = []
for m in r.matches:
    docs.append(Document(
        text=m.metadata["text"],
        doc_id=m.id,
        embedding=m.vector,
    ))
gpt_index = GPTListIndex(docs)
gpt_index.query("What is the future of AI?")

r being the result of a pinecone query. Unfortunately, don't know how to fetch all documents

It's a feature not a bug: ended up using gpt index on a subset of my pinecone index, potentially interesting

Jan 25 '23 18:01 louis030195

This works for me. This allows you to do both - use an existing pinecone index and namespace.

import pinecone
from gpt_index import GPTPineconeIndex
from gpt_index.data_structs.data_structs import PineconeIndexStruct
from IPython.display import Markdown, display

api_key = os.getenv("PINECONE_API_KEY")

pinecone.init(api_key=api_key, environment="us-east1-gcp")

# debug - verify connection
# print(pinecone.list_indexes())

# replace with your index and namespace
index_name = "your_index_name"
namespace = "your_namespace"

index = pinecone.Index(index_name)

# debug - verify index stats
# print(index.describe_index_stats())

# passing index_struct bypasses the 'creation' of index and sets it up for use
index = GPTPineconeIndex(pinecone_index=index,
                         index_struct=PineconeIndexStruct())

# (optional) required only if you want to query a specific namespace
query_kwargs = {
    "pinecone_kwargs": {"namespace": namespace}
}
response = index.query("What did the author do growing up?", verbose=True, **query_kwargs)
display(Markdown(f"<b>{response}</b>"))

Jan 26 '23 06:01 mahpat16

This works for me. This allows you to do both - use an existing pinecone index and namespace.

import pinecone
from gpt_index import GPTPineconeIndex
from gpt_index.data_structs.data_structs import PineconeIndexStruct
from IPython.display import Markdown, display

api_key = os.getenv("PINECONE_API_KEY")

pinecone.init(api_key=api_key, environment="us-east1-gcp")

# debug - verify connection
# print(pinecone.list_indexes())

# replace with your index and namespace
index_name = "your_index_name"
namespace = "your_namespace"

index = pinecone.Index(index_name)

# debug - verify index stats
# print(index.describe_index_stats())

# passing index_struct bypasses the 'creation' of index and sets it up for use
index = GPTPineconeIndex(pinecone_index=index,
                         index_struct=PineconeIndexStruct())

# (optional) required only if you want to query a specific namespace
query_kwargs = {
    "pinecone_kwargs": {"namespace": namespace}
}
response = index.query("What did the author do growing up?", verbose=True, **query_kwargs)
display(Markdown(f"<b>{response}</b>"))

Just tried it and it works. I believe index_struct=PineconeIndexStruct() was the missing piece. Thank you so much!

Jan 26 '23 07:01 bouiboui

This works for me. This allows you to do both - use an existing pinecone index and namespace.

import pinecone
from gpt_index import GPTPineconeIndex
from gpt_index.data_structs.data_structs import PineconeIndexStruct
from IPython.display import Markdown, display

api_key = os.getenv("PINECONE_API_KEY")

pinecone.init(api_key=api_key, environment="us-east1-gcp")

# debug - verify connection
# print(pinecone.list_indexes())

# replace with your index and namespace
index_name = "your_index_name"
namespace = "your_namespace"

index = pinecone.Index(index_name)

# debug - verify index stats
# print(index.describe_index_stats())

# passing index_struct bypasses the 'creation' of index and sets it up for use
index = GPTPineconeIndex(pinecone_index=index,
                         index_struct=PineconeIndexStruct())

# (optional) required only if you want to query a specific namespace
query_kwargs = {
    "pinecone_kwargs": {"namespace": namespace}
}
response = index.query("What did the author do growing up?", verbose=True, **query_kwargs)
display(Markdown(f"<b>{response}</b>"))

This almost works for me 😛 Problem is that my text isn't in the key "text" metadata

This https://github.com/jerryjliu/gpt_index/blob/cab30c4aec7b94c6d12a6efe3fb6b91a605f3869/gpt_index/indices/query/vector_store/pinecone.py#L79

Maybe could have the possibility to customize where it's picked in the metadata, and also considering the fact that the text is not in pinecone index metadata?

Jan 26 '23 09:01 louis030195

See, I can make it work like this:
pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENV)
index = GPTPineconeIndex(documents=[], pinecone_index=pinecone.Index('...'))

index.load_from_disk('../my_index.json')

index.insert(document=Document("document contents"))

index.save_to_disk('../my_index.json')

response = index.query("My question ?", verbose=True)
But I'm relying on a JSON file, the whole point of using Pinecone for me was to keep the data somewhere else, without touching the local filesystem. Is that possible?

Hey @bouiboui sorry I missed this. Just a heads up that by using the PineconeIndex, your data is stored in Pinecone, not in the .json file. In fact, doing save_to_disk and load_from_disk on the PineconeIndex doesn't really do anything. doing index.insert should work!

Jan 27 '23 22:01 jerryjliu

going to close for now, let me know if anything specific pops up

Feb 01 '23 01:02 jerryjliu

This works for me. This allows you to do both - use an existing pinecone index and namespace.

import pinecone
from gpt_index import GPTPineconeIndex
from gpt_index.data_structs.data_structs import PineconeIndexStruct
from IPython.display import Markdown, display

api_key = os.getenv("PINECONE_API_KEY")

pinecone.init(api_key=api_key, environment="us-east1-gcp")

# debug - verify connection
# print(pinecone.list_indexes())

# replace with your index and namespace
index_name = "your_index_name"
namespace = "your_namespace"

index = pinecone.Index(index_name)

# debug - verify index stats
# print(index.describe_index_stats())

# passing index_struct bypasses the 'creation' of index and sets it up for use
index = GPTPineconeIndex(pinecone_index=index,
                         index_struct=PineconeIndexStruct())

# (optional) required only if you want to query a specific namespace
query_kwargs = {
    "pinecone_kwargs": {"namespace": namespace}
}
response = index.query("What did the author do growing up?", verbose=True, **query_kwargs)
display(Markdown(f"<b>{response}</b>"))

Hey @mahpat16, does this still work for you? I need to use namespaces from now on, and it doesn't seem to work.

query_kwargs = {
    "pinecone_kwargs": {
        # "namespace": namespace
    }
}
response = index.query(message, verbose=True, **query_kwargs)

works great, it generates a natural language response, like "I can't answer with the documents provided". But if I uncomment the # "namespace": namespace line, it either sends me something like "Empty Response" or it crashes :

text = match.metadata["text"]
TypeError: 'NoneType' object is not subscriptable

Feb 11 '23 23:02 bouiboui

@bouiboui I'm facing the same/a similar problem trying to get a multi-namespace index going.

If I attempt to pass pinecone_kwargs in the index function, records are added but no namespace is applied.
If I query with the namespace option I get "Empty Response" back
Commenting out the namespace option means I get results based on the records that are in the index without a namespace

Feb 16 '23 10:02 stefl

Resolved

You now pass pinecone_kwargs when creating the index rather than in the query/index methods:

index = GPTPineconeIndex(pinecone_index=index,
                         index_struct=PineconeIndexStruct(), pinecone_kwargs={"namespace": namespace})
index.insert(document=(Document(text, doc_id=doc_id)))

Feb 18 '23 09:02 stefl

It looks like there's been a refactor of the IndexStruct code and PineconeIndexStruct is no longer, so this method no longer works on the latest version of Llamaindex.

Feb 26 '23 09:02 stefl

hopefully this link could help: https://discord.com/channels/1059199217496772688/1059200010622873741/1079094097916211270

Feb 26 '23 09:02 jerryjliu

@jerryjliu I've attempted to make a branch that uses what's suggested there but unsuccessfully.

I've resolved this temporarily by pinning to gpt-index==0.4.5 in my requirements.txt

Happy to contribute to this feature once the refactoring has stabilized!

Feb 26 '23 17:02 stefl

llama_index llama_index copied to clipboard

Example using an existing Pinecone index & namespace

llama_index
llama_index copied to clipboard