langchain
langchain copied to clipboard
Breaking Changes | OpenAI Callback
System Info
langchain="^0.0.172"
The from langchain.callbacks import get_openai_callback
callback seems to have broken in a new release. It was working when I was on "^0.0.158". The callback is working but no token or costs are appearing.
2023-05-18T02:41:37.844174Z [info ] openai charges [service.openai] completion_tokens=0 prompt_tokens=0 request_id=4da70135655b48d59d7f1e7528733f61 successful_requests=1 total_cost=0.0 total_tokens=0 user=user_2PkD3ZUhCdmBHiFPNP9tPZr7OLA
I was experiencing another breaking change with #4717 that seems to have been resolved.
Who can help?
@agola11 @vowelparrot
Information
- [ ] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [X] Callbacks/Tracing
- [ ] Async
Reproduction
import structlog
from langchain.callbacks import get_openai_callback
log = structlog.getLogger("main")
with get_openai_callback() as cb:
result = await agent.acall({ "input": body.prompt }, return_only_outputs=True)
openai_logger.info(
"openai charges",
prompt_tokens=cb.prompt_tokens,
completion_tokens=cb.completion_tokens,
total_tokens=cb.total_tokens,
total_cost=cb.total_cost,
successful_requests=cb.successful_requests,
user=user_id
)
Expected behavior
I was expecting tokens and costs to token counts to appear.
Also ran into this today.
Thanks for flagging. Could you share more about how to reproduc? I've run the following code on the latest release and it returns costs for all of them
from langchain.chat_models import ChatOpenAI
from langchain.callbacks import get_openai_callback
from langchain.llms import OpenAI
from langchain.schema import HumanMessage
from langchain.agents import initialize_agent, AgentType
def print_cost(name, cb):
print(
dict(name=name,
prompt_tokens=cb.prompt_tokens,
completion_tokens=cb.completion_tokens,
total_tokens=cb.total_tokens,
total_cost=cb.total_cost,
successful_requests=cb.successful_requests,
)
)
llm = ChatOpenAI()
for __llm in [ChatOpenAI, OpenAI]:
llm = __llm()
agent = initialize_agent([], llm=llm, agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION)
with get_openai_callback() as cb:
result = await agent.acall({ "input": "Respond with the Final Answer of 2 + 2.", "chat_history": []}, return_only_outputs=True)
print_cost(__llm.__name__, cb)
# Test the sync and async methods directly
llm = ChatOpenAI()
for _llm in [ChatOpenAI, OpenAI]:
llm = _llm()
with get_openai_callback() as cb:
llm.predict("What's your name?")
print_cost(_llm.__name__, cb)
with get_openai_callback() as cb2:
await ChatOpenAI().agenerate([[HumanMessage(content="Say a joke.")]])
print_cost("chat_async", cb2)
with get_openai_callback() as cb3:
await OpenAI().agenerate([["What follows 3?"]])
print_cost("davinci_async", cb3)
Reproduced it with the following
import asyncio
import os
from pydantic import BaseModel
from dotenv import load_dotenv
from langchain.tools import StructuredTool
from langchain.chat_models import ChatOpenAI
from langchain.callbacks import get_openai_callback
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.memory import ChatMessageHistory
from langchain.agents import StructuredChatAgent
from langchain.agents import AgentExecutor
from langchain.prompts import MessagesPlaceholder
load_dotenv()
def print_cost(cb):
print(
dict(
prompt_tokens=cb.prompt_tokens,
completion_tokens=cb.completion_tokens,
total_tokens=cb.total_tokens,
total_cost=cb.total_cost,
successful_requests=cb.successful_requests,
)
)
def load_tools():
def every(*args, **kwargs):
return "Use me for everything!"
async def aevery(*args, **kwargs):
return "Use me for everything!"
class Schema(BaseModel):
question: str
return [
StructuredTool(
name="everything_tool",
func=every,
coroutine=aevery,
args_schema=Schema,
description="Use me for everything!"
)
]
def load_structured_chat_agent(memory, tools, model_name="gpt-3.5-turbo", verbose=False, stream=False):
memory_prompt = MessagesPlaceholder(variable_name="chat_history")
agent = StructuredChatAgent.from_llm_and_tools(
llm=OpenAI(temperature=0, model_name=model_name, streaming=stream, verbose=verbose),
tools=tools,
memory=memory,
memory_prompts=[memory_prompt] if memory else None,
return_intermediate_steps=True,
input_variables=["input", "agent_scratchpad", "chat_history"] if memory else ["input", "agent_scratchpad"],
verbose=verbose,
)
return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=verbose, memory=memory)
async def main():
tools = load_tools()
history = ChatMessageHistory()
memory = ConversationBufferWindowMemory(
memory_key="chat_history",
output_key="output",
chat_memory=history,
return_messages=True,
k=3,
)
agent = load_structured_chat_agent(memory, tools, verbose=True, stream=True)
with get_openai_callback() as cb:
result = await agent.acall({ "input": "Respond with the Final Answer of 2 + 2.", "chat_history": []}, return_only_outputs=True)
print_cost(cb)
if __name__ == "__main__":
asyncio.run(main())
The costs aren't being counted in streaming mode. If I set stream=False
then it gives:
{'prompt_tokens': 640, 'completion_tokens': 94, 'total_tokens': 734, 'total_cost': 0.0014680000000000001, 'successful_requests': 2}
Was this working before? cc @agola11
It's definitely something we should add, but this code snippet seems different than the original poster's issue
The costs aren't being counted in streaming mode. If I set
stream=False
then it gives:{'prompt_tokens': 640, 'completion_tokens': 94, 'total_tokens': 734, 'total_cost': 0.0014680000000000001, 'successful_requests': 2}
Was this working before? cc @agola11
It's definitely something we should add, but this code snippet seems different than the original poster's issue
Yeah i had streaming and the callback working last week. I am not sure why it has changed.
Sorry, didn't post the full code in the original issue as I had to clean things up from other imports.
this is also not working (openai callback with streaming) in 0.0.268: the number of tokens used is always 0.
Any update yet? Still facing the issue in the latest release.
If you guys add ConversationBufferMemory to your chain, it will make your Openai callback work. When you have streaming enabled, it can't calculate the token cost, but if you store it in memory, it will be able to calculate the token cost.
Hi, @jordanparker6! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, the issue you reported is related to the get_openai_callback
callback in the langchain library. It seems that in the latest release, this callback is not returning tokens or costs as expected. Some users have provided code snippets to reproduce the issue, and the developers are currently investigating the problem. One user suggested that adding ConversationBufferMemory to the chain might resolve the issue.
Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself. If we don't receive any response within 7 days, the issue will be automatically closed.
Thank you for your understanding and cooperation. If you have any further questions or concerns, please don't hesitate to reach out.
I believe this still needs to be addressed, it seemed like numerous people in different issues were going to do a PR. If nobody has done one yet, I would be willing to do a PR to attempt to address this.
@vowelparrot Could you please help @CameronVetter with this issue? They have indicated that it still needs to be addressed and are willing to do a PR if necessary. Thank you!
This is most definitely still an issue and has been driving me crazy for 3 days now. Please fix this as there doesn't seem to be any other way to count tokens that works with LangChain and OpenAI APi.