langchain Breaking Changes | OpenAI Callback

System Info

langchain="^0.0.172"

The from langchain.callbacks import get_openai_callback callback seems to have broken in a new release. It was working when I was on "^0.0.158". The callback is working but no token or costs are appearing.

2023-05-18T02:41:37.844174Z [info     ] openai charges                 [service.openai] completion_tokens=0 prompt_tokens=0 request_id=4da70135655b48d59d7f1e7528733f61 successful_requests=1 total_cost=0.0 total_tokens=0 user=user_2PkD3ZUhCdmBHiFPNP9tPZr7OLA

I was experiencing another breaking change with #4717 that seems to have been resolved.

Who can help?

@agola11 @vowelparrot

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[X] Callbacks/Tracing
[ ] Async

Reproduction

import structlog
from langchain.callbacks import get_openai_callback

log = structlog.getLogger("main")

with get_openai_callback() as cb:
    result = await agent.acall({ "input": body.prompt }, return_only_outputs=True)
openai_logger.info(
    "openai charges", 
    prompt_tokens=cb.prompt_tokens, 
    completion_tokens=cb.completion_tokens, 
    total_tokens=cb.total_tokens,
    total_cost=cb.total_cost, 
    successful_requests=cb.successful_requests,
    user=user_id
)

Expected behavior

I was expecting tokens and costs to token counts to appear.

May 18 '23 02:05 jordanparker6

Also ran into this today.

May 18 '23 12:05 jonosooty

Thanks for flagging. Could you share more about how to reproduc? I've run the following code on the latest release and it returns costs for all of them

from langchain.chat_models import ChatOpenAI
from langchain.callbacks import get_openai_callback
from langchain.llms import OpenAI
from langchain.schema import HumanMessage
from langchain.agents import initialize_agent, AgentType


def print_cost(name, cb):
    print(
        dict(name=name, 
        prompt_tokens=cb.prompt_tokens, 
        completion_tokens=cb.completion_tokens, 
        total_tokens=cb.total_tokens,
        total_cost=cb.total_cost, 
        successful_requests=cb.successful_requests,
            )
    )
    

llm = ChatOpenAI()
for __llm in [ChatOpenAI, OpenAI]:
    llm = __llm()
    agent = initialize_agent([], llm=llm, agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION)

    with get_openai_callback() as cb:
        result = await agent.acall({ "input": "Respond with the Final Answer of 2 + 2.", "chat_history": []}, return_only_outputs=True)
    print_cost(__llm.__name__, cb)
    
# Test the sync and async methods directly
llm = ChatOpenAI()
for _llm in [ChatOpenAI, OpenAI]:
    llm = _llm()
    with get_openai_callback() as cb:
        llm.predict("What's your name?")
    print_cost(_llm.__name__, cb)
    
with get_openai_callback() as cb2:
    await ChatOpenAI().agenerate([[HumanMessage(content="Say a joke.")]])
print_cost("chat_async", cb2)
                                 
with get_openai_callback() as cb3:
    await OpenAI().agenerate([["What follows 3?"]])
print_cost("davinci_async", cb3)

May 18 '23 14:05 vowelparrot

Reproduced it with the following

import asyncio
import os
from pydantic import BaseModel
from dotenv import load_dotenv
from langchain.tools import StructuredTool
from langchain.chat_models import ChatOpenAI
from langchain.callbacks import get_openai_callback
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.memory import ChatMessageHistory
from langchain.agents import StructuredChatAgent
from langchain.agents import AgentExecutor
from langchain.prompts import MessagesPlaceholder

load_dotenv()

def print_cost(cb):
    print(
        dict(
        prompt_tokens=cb.prompt_tokens, 
        completion_tokens=cb.completion_tokens, 
        total_tokens=cb.total_tokens,
        total_cost=cb.total_cost, 
        successful_requests=cb.successful_requests,
            )
    )

def load_tools():
    def every(*args, **kwargs):
        return "Use me for everything!"
    async def aevery(*args, **kwargs):
        return "Use me for everything!"
    
    class Schema(BaseModel):
        question: str

    return [
        StructuredTool(
            name="everything_tool",
            func=every,
            coroutine=aevery,
            args_schema=Schema,
            description="Use me for everything!"
        )
    ]

def load_structured_chat_agent(memory, tools, model_name="gpt-3.5-turbo", verbose=False, stream=False):
    memory_prompt = MessagesPlaceholder(variable_name="chat_history")
    agent = StructuredChatAgent.from_llm_and_tools(
        llm=OpenAI(temperature=0, model_name=model_name, streaming=stream, verbose=verbose),
        tools=tools,
        memory=memory,
        memory_prompts=[memory_prompt] if memory else None,
        return_intermediate_steps=True,
        input_variables=["input", "agent_scratchpad", "chat_history"] if memory else ["input", "agent_scratchpad"],
        verbose=verbose,
    )
    return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=verbose, memory=memory)
    
async def main():
    tools = load_tools()
    history = ChatMessageHistory()
    memory = ConversationBufferWindowMemory(
        memory_key="chat_history",
        output_key="output",
        chat_memory=history,
        return_messages=True,
        k=3,
    )
    agent = load_structured_chat_agent(memory, tools, verbose=True, stream=True)

    with get_openai_callback() as cb:
        result = await agent.acall({ "input": "Respond with the Final Answer of 2 + 2.", "chat_history": []}, return_only_outputs=True)
    print_cost(cb)
        

if __name__ == "__main__":
    asyncio.run(main())

May 18 '23 19:05 jordanparker6

The costs aren't being counted in streaming mode. If I set stream=False then it gives:

{'prompt_tokens': 640, 'completion_tokens': 94, 'total_tokens': 734, 'total_cost': 0.0014680000000000001, 'successful_requests': 2}

Was this working before? cc @agola11

It's definitely something we should add, but this code snippet seems different than the original poster's issue

May 18 '23 21:05 vowelparrot

The costs aren't being counted in streaming mode. If I set stream=False then it gives:
{'prompt_tokens': 640, 'completion_tokens': 94, 'total_tokens': 734, 'total_cost': 0.0014680000000000001, 'successful_requests': 2}
Was this working before? cc @agola11

It's definitely something we should add, but this code snippet seems different than the original poster's issue

Yeah i had streaming and the callback working last week. I am not sure why it has changed.

Sorry, didn't post the full code in the original issue as I had to clean things up from other imports.

May 18 '23 21:05 jordanparker6

this is also not working (openai callback with streaming) in 0.0.268: the number of tokens used is always 0.

Aug 30 '23 01:08 ujaved

Any update yet? Still facing the issue in the latest release.

Aug 30 '23 11:08 quangdd-taureauai

If you guys add ConversationBufferMemory to your chain, it will make your Openai callback work. When you have streaming enabled, it can't calculate the token cost, but if you store it in memory, it will be able to calculate the token cost.

Aug 30 '23 11:08 AndromedaPerseus

Hi, @jordanparker6! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue you reported is related to the get_openai_callback callback in the langchain library. It seems that in the latest release, this callback is not returning tokens or costs as expected. Some users have provided code snippets to reproduce the issue, and the developers are currently investigating the problem. One user suggested that adding ConversationBufferMemory to the chain might resolve the issue.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself. If we don't receive any response within 7 days, the issue will be automatically closed.

Thank you for your understanding and cooperation. If you have any further questions or concerns, please don't hesitate to reach out.

Nov 29 '23 16:11 dosubot[bot]

I believe this still needs to be addressed, it seemed like numerous people in different issues were going to do a PR. If nobody has done one yet, I would be willing to do a PR to attempt to address this.

Nov 30 '23 20:11 CameronVetter

@vowelparrot Could you please help @CameronVetter with this issue? They have indicated that it still needs to be addressed and are willing to do a PR if necessary. Thank you!

Nov 30 '23 20:11 dosubot[bot]

This is most definitely still an issue and has been driving me crazy for 3 days now. Please fix this as there doesn't seem to be any other way to count tokens that works with LangChain and OpenAI APi.

Dec 04 '23 22:12 jjshab

langchain langchain copied to clipboard

Breaking Changes | OpenAI Callback

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

langchain
langchain copied to clipboard