langchain
langchain copied to clipboard
LangChain classes share openai global values
System Info
langchain==0.0.169
Who can help?
@hwchase17 @ekzh
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [X] LLMs/Chat Models
- [X] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
import os
import langchain
import openai
from langchain.llms import AzureOpenAI
from langchain.chat_models import AzureChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
llmconfig = {
"openai_api_key": "<secret>",
"openai_api_base": "https://myllm.openai.azure.com/",
"deployment_name": "davinci",
}
chatconfig = {
"model_name": "gpt-35-turbo",
"openai_api_type": "azure",
"openai_api_version": "chatVERSION",
"openai_api_key": "<secret>",
"openai_api_base": "https://mychat.openai.azure.com/",
"deployment_name": "gpt-35-turbo",
}
embedderconfig = {
"openai_api_key": "<secret>",
"model": "ada",
"openai_api_base": "https://myembedder.openai.azure.com/",
"openai_api_version": "embedderVERSION",
"deployment": "ada",
}
# First time
llm = AzureOpenAI(**llmconfig)
print(openai.api_version)
chat = AzureChatOpenAI(**chatconfig)
print(openai.api_version)
embedder = OpenAIEmbeddings(**embedderconfig)
print(openai.api_version)
print("\n")
# Second time
llm = AzureOpenAI(**llmconfig)
print(openai.api_version)
chat = AzureChatOpenAI(**chatconfig)
print(openai.api_version)
embedder = OpenAIEmbeddings(**embedderconfig)
print(openai.api_version)
This code will return the following:
None
chatVERSION
embedderVERSION
embedderVERSION
chatVERSION
embedderVERSION
Expected behavior
The LangChain classes should not alter the global openai module values, because this could cause conflicts when multiple classes are using those.
For example if using Chat/Completion API and Embeddings API use a different api_version
value.
Or when using Chat/Completion from Azure and Embeddings from OpenAI, because the classes share the same openai global values, depending on the order of operations there will be unexpected behaviours.
Related issues: #2683 #4352
Related PR: https://github.com/hwchase17/langchain/pull/4234 https://github.com/pieroit/cheshire-cat/pull/195
Related code: https://github.com/hwchase17/langchain/blob/a7af32c274860ee9174830804301491973aaee0a/langchain/chat_models/azure_openai.py#L79-L87
and
https://github.com/hwchase17/langchain/blob/a7af32c274860ee9174830804301491973aaee0a/langchain/embeddings/openai.py#L166-L178
Related issue: https://github.com/openai/openai-python/issues/411
I see. I think it is possible to set all the parameters as keyword arguments in the openai.Completion.create
and openai.ChatCompletion.create
methods.
Ran into this also... where somehow OpenAIEmbeddings is getting all the variables passed in, but it isn't passing them into the openai.Embeddings
instance
I would like to help on this one cause its blocking me from going further, my ada and gpt models are in two different Azure region. @ekzhu could you put me on the right path to implement a workaround ?
From what I understand in embeddings.py we have the following import that we also have in chat_models/azure_openai.py: try:
import openai
openai.api_key = openai_api_key
if openai_organization:
openai.organization = openai_organization
if openai_api_base:
openai.api_base = openai_api_base
if openai_api_type:
openai.api_version = openai_api_version
if openai_api_type:
openai.api_type = openai_api_type
values["client"] = openai.Embedding
except ImportError:
raise ValueError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
return values
We have 2 different calls to the openai modules that load its settings from os.environ (in openai/init_.py) :
api_key = os.environ.get("OPENAI_API_KEY")
# Path of a file with an API key, whose contents can change. Supercedes
# `api_key` if set. The main use case is volume-mounted Kubernetes secrets,
# which are updated automatically.
api_key_path: Optional[str] = os.environ.get("OPENAI_API_KEY_PATH")
organization = os.environ.get("OPENAI_ORGANIZATION")
api_base = os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1")
api_type = os.environ.get("OPENAI_API_TYPE", "open_ai")
api_version = (
"2023-03-15-preview" if api_type in ("azure", "azure_ad", "azuread") else None
)
I tried passing the base/key or creating the llm and embedding objects "on the fly" in the ConversationalRetrievalChain.from_llm() function without luck:
Is the issue related to the fact that we have singletons in env settings once the module is loaded ? (not a Python expert)
What would be the best way to handle 2 different openai context ? I looked at the openai forum but I didn't find similar context (with dual base/keys)
I looked at completion/create methods but didn't find a proper way to handle this in a single script.
@Pilosite : Can you please fix the Markdown formatting of the above comment ? Thanks
@Pilosite I am not a python expert, but if I understand correctly what @ekzhu is suggesting:
Instead of doing this: https://github.com/hwchase17/langchain/blob/a7af32c274860ee9174830804301491973aaee0a/langchain/chat_models/azure_openai.py#L94
You need to do:
values["client"] = openai.ChatCompletion.create(<parameters>)
Look at this:
In [27]: import openai
In [32]: values = {}
In [33]: values["client"] = openai.ChatCompletion
In [34]: type(values["client"])
Out[34]: type
In [35]: values["client"]
Out[35]: openai.api_resources.chat_completion.ChatCompletion
Instead of that, you want to create an actual object (that will not be global). Example:
In [36]: messages = [{"role": "system", "content": ""},]
In [37]: from openai import ChatCompletion
In [38]: c = ChatCompletion.create(engine="gpt-35-turbo",api_key="SECRET",api_type="azure",api_version="2023-03-15-preview",api_base="https://dummy.openai.azure.com/",messages=messages)
In [39]: type(c)
Out[39]: openai.openai_object.OpenAIObject
In [40]: c
Out[40]:
<OpenAIObject chat.completion id=chatcmpl-7J7GLapWegBSPW3YXQcSmEj5HykOK at 0x10ccfb0b0> JSON: {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "I'm sorry, I cannot provide an answer without a specific question. Please provide more details so I can assist you better.",
"role": "assistant"
}
}
],
"created": 1684790505,
"id": "chatcmpl-7J7GLapWegBSPW3YXQcSmEj5HykOK",
"model": "gpt-35-turbo",
"object": "chat.completion",
"usage": {
"completion_tokens": 25,
"prompt_tokens": 8,
"total_tokens": 33
}
}
It is a bit weird because to create the object you need immediately messages
and it will make an API call right away.
It comes from here: https://github.com/openai/openai-python/blob/fe3abd16b582ae784d8a73fd249bcdfebd5752c9/openai/api_resources/chat_completion.py#L8-L30
I dont understand if there is a way to create a <OpenAIObject chat.completion>
without firing immediately a POST
request to the API.
@zioproto thanks a lot for your time, I will check !
@Pilosite I am not a python expert, but if I understand correctly what @ekzhu is suggesting:
Instead of doing this:
https://github.com/hwchase17/langchain/blob/a7af32c274860ee9174830804301491973aaee0a/langchain/chat_models/azure_openai.py#L94
You need to do:
values["client"] = openai.ChatCompletion.create(<parameters>)
Look at this:
In [27]: import openai In [32]: values = {} In [33]: values["client"] = openai.ChatCompletion In [34]: type(values["client"]) Out[34]: type In [35]: values["client"] Out[35]: openai.api_resources.chat_completion.ChatCompletion
Instead of that, you want to create an actual object (that will not be global). Example:
In [36]: messages = [{"role": "system", "content": ""},] In [37]: from openai import ChatCompletion In [38]: c = ChatCompletion.create(engine="gpt-35-turbo",api_key="SECRET",api_type="azure",api_version="2023-03-15-preview",api_base="https://dummy.openai.azure.com/",messages=messages) In [39]: type(c) Out[39]: openai.openai_object.OpenAIObject In [40]: c Out[40]: <OpenAIObject chat.completion id=chatcmpl-7J7GLapWegBSPW3YXQcSmEj5HykOK at 0x10ccfb0b0> JSON: { "choices": [ { "finish_reason": "stop", "index": 0, "message": { "content": "I'm sorry, I cannot provide an answer without a specific question. Please provide more details so I can assist you better.", "role": "assistant" } } ], "created": 1684790505, "id": "chatcmpl-7J7GLapWegBSPW3YXQcSmEj5HykOK", "model": "gpt-35-turbo", "object": "chat.completion", "usage": { "completion_tokens": 25, "prompt_tokens": 8, "total_tokens": 33 } }
It is a bit weird because to create the object you need immediately
messages
and it will make an API call right away.It comes from here: https://github.com/openai/openai-python/blob/fe3abd16b582ae784d8a73fd249bcdfebd5752c9/openai/api_resources/chat_completion.py#L8-L30
I dont understand if there is a way to create a
<OpenAIObject chat.completion>
without firing immediately aPOST
request to the API.
This is exactly what I was saying. 👍
a workaround is here: https://gist.github.com/kumapo/d32e0864ba81d94fb17e7d948f346e46
you can import OpenAIEmbeddings
from embeddings
and use it with EMBED_OPENAI_API_KEY
s in environ.
thanks a lot @kumapo ! => can confirm this works as expected, nice !
Could we make use of partial
on the validation_environment
method to define the env keys for the client
and then not passing them anymore further down the pipeline? This would make it that you only read the env keys once and set them for that class instance forever, thus allowing to have different instances with different keys/bases/etc
Not sure if everyone noticed it but this should now be solved since #5792 🎉
I confirm I cant reproduce this issue anymore with LangChain 0.0.199