llama_index Asks for OPENAI_KEY when it's not needed

I'm playing around with Qdrant as a vector store index using sentence-transformer embeddings from HuggingFace.

However, when I try to create an index, I get an error message Did not find openai_api_key, please add an environment variable OPENAI_API_KEY which contains it, or pass openai_api_key as a named parameter.. I'm not using any OpenAI API, so why is llama-index asking for the key?

model_name = "sentence-transformers/all-mpnet-base-v2"
embed_model_lang = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_name))   
data = "hello world"
documents = embed_model_lang.get_text_embedding(data) 
index = GPTQdrantIndex(documents, collection_name="test", client=client, embed_model=embed_model_lang)

Mar 23 '23 11:03 mjp0

@mjp0 it still needs an LLM to operate. You've specified embeddings, but it's initializing the default LLM which is text-davinici-003 (which is need to generate natural language responses to queries over your documents)

If you want to use a custom LLM, there is a guide a here: https://gpt-index.readthedocs.io/en/latest/how_to/custom_llms.html#example-using-a-custom-llm-model

If you don't need an LLM or natural language responses, you can do os.environ["OPENAI_API_KEY"] = "random" and then use response = index.query("My query", response_mode="no_text") to skip call the LLM. Then, you can check response.source_nodes for the matching node(s)

(also just a side note, you might want to pass in the actual data instead of embedding it first, the index will embed it for you using the supplied embed_model)

Mar 23 '23 15:03 logan-markewich

@mjp0 if you have more questions beyond @logan-markewich's super helpful response, please join the discord community (https://discord.gg/dGcwcsnxhU) and continue the discussion there!

Mar 25 '23 02:03 Disiok

I'm playing around with Qdrant as a vector store index using sentence-transformer embeddings from HuggingFace.

However, when I try to create an index, I get an error message Did not find openai_api_key, please add an environment variable OPENAI_API_KEY which contains it, or pass openai_api_key as a named parameter.. I'm not using any OpenAI API, so why is llama-index asking for the key?
model_name = "sentence-transformers/all-mpnet-base-v2"
embed_model_lang = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_name))   
data = "hello world"
documents = embed_model_lang.get_text_embedding(data) 
index = GPTQdrantIndex(documents, collection_name="test", client=client, embed_model=embed_model_lang)

Can you share a example how to create embeddings with your documents WITHOUT openai api key? Reading this repository have feeling that they work ONLY for openai ... PS: Spuriously, the link about how to make custom LLM in the answer not works

May 17 '23 09:05 AngelTs

There are two models, the embed model and the llm

You'll need to set both in the service context, and pass your service context into the index

https://colab.research.google.com/drive/16QMQePkONNlDpgiltOi7oRQgmB8dU5fl?usp=sharing

May 17 '23 13:05 logan-markewich

There are two models, the embed model and the llm

You'll need to set both in the service context, and pass your service context into the index

https://colab.research.google.com/drive/16QMQePkONNlDpgiltOi7oRQgmB8dU5fl?usp=sharing

Thank you for the answer, but the simple example is missing. Am i wrong if conclude that embedding is private technology of OpenAI and there is no way to use on other LLMs? I can't find a simple example of python script using embeddings WITHOUT to use OpenAI API. The only one alternative is Extensible retrieval augmented system of OpenChatKit!, BUT it is in experimental stage

May 17 '23 14:05 AngelTs

The colab notebook I linked uses local huggingface embeddings...

May 17 '23 14:05 logan-markewich

Remember, you need to set both an embedding model AND and llm predictor to avoid open ai. That colab notebook does both. Our docs also have examples of setting up each one

May 17 '23 15:05 logan-markewich

hf_predictor = HuggingFaceLLMPredictor(
    max_input_size=2048, 
    max_new_tokens=256,
    temperature=0.25,
    do_sample=False,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="Writer/camel-5b-hf",
    model_name="Writer/camel-5b-hf",
    device_map="auto",
    tokenizer_kwargs={"max_length": 2048},
    model_kwargs={"torch_dtype": torch.bfloat16}
)

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))

service_context = ServiceContext.from_defaults(chunk_size_limit=512, llm_predictor=hf_predictor, embed_model=embed_model)

This runs everything locally, on your machine, no openai

May 17 '23 15:05 logan-markewich

can anyone comment if we can run everything in an offline where we have downloaded the models. So no need for a connectkion to huggingface.co

Jun 27 '23 13:06 aniruddhs07

The colab notebook I linked uses local huggingface embeddings...

Your colab notebook example requires open_api key which is the opposite of the question ...

AuthenticationError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/tenacity/init.py in call(self, fn, *args, **kwargs) 381 try: --> 382 result = fn(*args, **kwargs) 383 except BaseException: # noqa: B902

18 frames AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>', or you can set the environment variable OPENAI_API_KEY=<API-KEY>). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = <PATH>'. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.

Jul 17 '23 06:07 AngelTs

@AngelTs once again, of course you can use local models without openai

While this thread is pretty old now, here's a detailed example right from our docs, the updated usage for v0.7.9. Much of the LLM setup is specific to stable lm, you'll want to see the model card from huggingface for full setup details

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext

embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

from llama_index.prompts import Prompt

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

# This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = Prompt("<|USER|>{query_str}<|ASSISTANT|>")

import torch
from llama_index.llms import HuggingFaceLLM
llm = HuggingFaceLLM(
    context_window=4096, 
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.7, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b",
    model_name="StabilityAI/stablelm-tuned-alpha-3b",
    device_map="auto",
    stopping_ids=[50278, 50279, 50277, 1, 0],
    tokenizer_kwargs={"max_length": 4096},
    # uncomment this if using CUDA to reduce memory usage
    # model_kwargs={"torch_dtype": torch.float16}
)
service_context = ServiceContext.from_defaults(
    chunk_size=1024, 
    llm=llm,
    embed_mode=embed_model
)

# set global settings
from llama_index import set_global_service_context 
set_global_service_context(service_context)

https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/llms/usage_custom.html#example-using-a-huggingface-llm

https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/usage_pattern.html#embedding-model-integrations

https://gpt-index.readthedocs.io/en/latest/core_modules/supporting_modules/service_context.html

Jul 17 '23 13:07 logan-markewich

can anyone comment if we can run everything in an offline where we have downloaded the models. So no need for a connection to huggingface.co

I just can say that for some reason guys here DO NOT WANT to provide a simple example WITHOT openaAI api key (maybe they are their contractors, or employees or whatever ...) So if your goal is to achieve privacy, true privacy of your data, this is not the right place to seek TRUE answer! (i asked them [polite] couple of times to proved a simple example without openai infrastructure and they offered me colab notebook EXACTLY for openai (see upper post). They forget that examples with local LLM and openAi are millions in the web! I will not sped time here any more ...

Jul 17 '23 14:07 AngelTs

@AngelTs I just provided a clear example of using local models from huggingface... in any case, best of luck in your LLM journey

Jul 17 '23 15:07 logan-markewich

@AngelTs once again, of course you can use local models without openai

While this thread is pretty old now, here's a detailed example right from our docs, the updated usage for v0.7.9. Much of the LLM setup is specific to stable lm, you'll want to see the model card from huggingface for full setup details

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext

embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

from llama_index.prompts import Prompt

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

# This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = Prompt("<|USER|>{query_str}<|ASSISTANT|>")

import torch
from llama_index.llms import HuggingFaceLLM
llm = HuggingFaceLLM(
    context_window=4096, 
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.7, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b",
    model_name="StabilityAI/stablelm-tuned-alpha-3b",
    device_map="auto",
    stopping_ids=[50278, 50279, 50277, 1, 0],
    tokenizer_kwargs={"max_length": 4096},
    # uncomment this if using CUDA to reduce memory usage
    # model_kwargs={"torch_dtype": torch.float16}
)
service_context = ServiceContext.from_defaults(
    chunk_size=1024, 
    llm=llm,
    embed_mode=embed_model
)

# set global settings
from llama_index import set_global_service_context 
set_global_service_context(service_context)

https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/llms/usage_custom.html#example-using-a-huggingface-llm

https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/usage_pattern.html#embedding-model-integrations

https://gpt-index.readthedocs.io/en/latest/core_modules/supporting_modules/service_context.html

@logan-markewich - Thank you, this worked for local only (@AngelTs, I suspect you are having a PEBKAC issue).

However, I do have a question. I am having a hard time adapting this example to use this HF model: https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ. I get the following error: TheBloke/Llama-2-7b-Chat-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. Any advice?

Jul 26 '23 14:07 MachLearnPort

I am also getting the following error:

Traceback (most recent call last):
  File "/Users/ck/fugazi_tech/rag_projects/llama_banker/venv/lib/python3.11/site-packages/llama_index/core/embeddings/utils.py", line 59, in resolve_embed_model
    validate_openai_api_key(embed_model.api_key)
  File "/Users/ck/fugazi_tech/rag_projects/llama_banker/venv/lib/python3.11/site-packages/llama_index/embeddings/openai/utils.py", line 104, in validate_openai_api_key
    raise ValueError(MISSING_API_KEY_ERROR_MESSAGE)
ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ck/fugazi_tech/rag_projects/llama_banker/app.py", line 78, in <module>
    index = VectorStoreIndex.from_documents(documents)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ck/fugazi_tech/rag_projects/llama_banker/venv/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 145, in from_documents
    return cls(
           ^^^^
  File "/Users/ck/fugazi_tech/rag_projects/llama_banker/venv/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 71, in __init__
    else embed_model_from_settings_or_context(Settings, service_context)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ck/fugazi_tech/rag_projects/llama_banker/venv/lib/python3.11/site-packages/llama_index/core/settings.py", line 274, in embed_model_from_settings_or_context
    return settings.embed_model
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/ck/fugazi_tech/rag_projects/llama_banker/venv/lib/python3.11/site-packages/llama_index/core/settings.py", line 67, in embed_model
    self._embed_model = resolve_embed_model("default")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ck/fugazi_tech/rag_projects/llama_banker/venv/lib/python3.11/site-packages/llama_index/core/embeddings/utils.py", line 66, in resolve_embed_model
    raise ValueError(
ValueError: 
******
Could not load OpenAI embedding model. If you intended to use OpenAI, please check your OPENAI_API_KEY.
Original error:
No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys

Consider using embed_model='local'.
Visit our documentation for more embedding options: https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#modules
******

Even though I am exclusively passing huggingfaceEmbedding in my code:

# Create embeddings instance explicitly with HuggingFaceEmbeddings
embed_model = LangchainEmbedding(HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2"))

llm = Bedrock(
    model_id=llamaModelId,
    model_kwargs={'max_gen_len': 2048, 'top_p': 0.5, 'temperature': 0.9}
)

service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embed_model  # Use the globally set HuggingFaceEmbeddings
)

# Set the global service context
llama_index.global_service_context = service_context

Why is this the case, I couldnt find any resolution to this problem in any of the linked issues here. Could you please assist me in the right direction here @logan-markewich

Mar 19 '24 09:03 canberk17

Using the following solved the problem for me:

embed_model = LangchainEmbedding(SentenceTransformerEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2"))

instead of using this:

embed_model = LangchainEmbedding(HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2"))

Mar 19 '24 16:03 canberk17

bump

> from langchain.memory import ChatMessageHistory

with new imports of:

>> from langchain_community.chat_message_histories import ChatMessageHistory
You can use the langchain cli to **automatically** upgrade many imports. Please see documentation here <https://python.langchain.com/v0.2/docs/versions/v0_2/>
  warn_deprecated(
Traceback (most recent call last):
  File "/home/miller/.local/lib/python3.11/site-packages/llama_index/core/embeddings/utils.py", line 59, in resolve_embed_model
    validate_openai_api_key(embed_model.api_key)
  File "/home/miller/.local/lib/python3.11/site-packages/llama_index/embeddings/openai/utils.py", line 103, in validate_openai_api_key
    raise ValueError(MISSING_API_KEY_ERROR_MESSAGE)
ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lmaddox/innovanon.git/./test.py", line 56, in <module>
    lyft_index = VectorStoreIndex.from_documents(lyft_docs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miller/.local/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 145, in from_documents
    return cls(
           ^^^^
  File "/home/miller/.local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 71, in __init__
    else embed_model_from_settings_or_context(Settings, service_context)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miller/.local/lib/python3.11/site-packages/llama_index/core/settings.py", line 274, in embed_model_from_settings_or_context
    return settings.embed_model
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/miller/.local/lib/python3.11/site-packages/llama_index/core/settings.py", line 67, in embed_model
    self._embed_model = resolve_embed_model("default")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miller/.local/lib/python3.11/site-packages/llama_index/core/embeddings/utils.py", line 66, in resolve_embed_model
    raise ValueError(
ValueError: 
******
Could not load OpenAI embedding model. If you intended to use OpenAI, please check your OPENAI_API_KEY.
Original error:
No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys

Consider using embed_model='local'.
Visit our documentation for more embedding options: https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#modules
******

from the sample... I just switched out the LLM is all

#! /usr/bin/env python


from llama_index.core.agent import ReActAgent
#from llama_index.llms.openai import OpenAI
from llama_index.llms.ollama import Ollama
import asyncio
import nest_asyncio
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata





try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft"
    )
    lyft_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/uber"
    )
    uber_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False







if not index_loaded:
    # load data
    lyft_docs = SimpleDirectoryReader(
        input_files=["./data/10k/lyft_2021.pdf"]
    ).load_data()
    uber_docs = SimpleDirectoryReader(
        input_files=["./data/10k/uber_2021.pdf"]
    ).load_data()

    # build index
    lyft_index = VectorStoreIndex.from_documents(lyft_docs)
    uber_index = VectorStoreIndex.from_documents(uber_docs)

    # persist index
    lyft_index.storage_context.persist(persist_dir="./storage/lyft")
    uber_index.storage_context.persist(persist_dir="./storage/uber")


lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)



query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]








# [Optional] Add Context
# context = """\
# You are a stock market sorcerer who is an expert on the companies Lyft and Uber.\
#     You will answer questions about Uber and Lyft as in the persona of a sorcerer \
#     and veteran stock market investor.
# """
#llm = OpenAI(model="gpt-3.5-turbo-0613")
llm = Ollama(model="tinyllama", request_timeout=600, base_url="http://kali.innovanon.com:11434")

agent = ReActAgent.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    # context=context
)





# Try another query with async execution


nest_asyncio.apply()

response = asyncio.run(agent.achat(
    "Compare and contrast the risks of Uber and Lyft in 2021, then give an"
    " analysis"
))
print(str(response))

UPDATE:

prepend this to the file:

from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
llm = Ollama(model="tinyllama", request_timeout=600, base_url="http://kali.innovanon.com:11434")
Settings.llm = llm
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

now the agent fails on the httpx request, but at least it doesn't try to hit up the openai api without permission

httpcore.ConnectError: [Errno 111] Connection refused

Jul 13 '24 13:07 lmaddox

llama_index llama_index copied to clipboard

Asks for OPENAI_KEY when it's not needed

llama_index
llama_index copied to clipboard