langchain Feat: adding on_llm_end callback to ChatOpenAI

Kick starting this PR to get some feedback on the design choices to get token usage working in ChatOpenAI

Here's what I found so far, pls let me know if I'm on the right track

BaseChatModel's __call__ calls ChatOpenAI's _generate
#1785 ensures token_usage is returned by _generate. However, there is no on_llm_end() callback to expose token_usage in the __call__
Adding on_llm_callback() just prior to returning from __call__ solves the issue and successfully exposes token_usage but results in a whole bunch of linting errors since the response type is now ChatResult and not LLMResult that on_llm_callback is expecting.
The PR currently implements the above point and has linting errors that I can go ahead and fix if this direction is valid. Or is there a better place to add the on_llm_callback() to expose token_usage for ChatOpenAI? Please let me know your thoughts.

Current master returns 0 for token_usage:

from langchain.chat_models import ChatOpenAI
from langchain.callbacks import get_openai_callback
from langchain.schema import HumanMessage

chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.8)

with get_openai_callback() as cb:
    response = chat([HumanMessage(content="Tell me a joke")])
    print(cb.total_tokens)

0

This PR fixes this issue, and the same code prints: 25

Mar 23 '23 12:03 stephenleo

@hwchase17 , could you pls help to take a look at this when you can?

Mar 28 '23 06:03 stephenleo

@hwchase17 , could you pls help to take a look at this when you can?

we'll want to fix in a different way. this callback is already on this class in a place, this likely means it in the wrong place. i can take a look

Mar 28 '23 22:03 hwchase17

@hwchase17 , could you pls help to take a look at this when you can?

we'll want to fix in a different way. this callback is already on this class in a place, this likely means it in the wrong place. i can take a look

this is a bit more complicated of a fix... let me think on it a bit

Mar 28 '23 22:03 hwchase17

@hwchase17 How about creating chat-specific callbacks, like on_chat_end?

Apr 10 '23 11:04 fabioperez

Following this..

Apr 11 '23 19:04 oliverbj

Following

Apr 11 '23 22:04 Majiick

following this

Apr 14 '23 02:04 rayli09

Following

Apr 15 '23 22:04 yuyuma

following

Apr 21 '23 16:04 achammah

@hwchase17 hi there, was wondering if there's any plans on fixing this? or any workarounds?

Apr 22 '23 04:04 rayli09

@hwchase17, is any fix for this? or any workarounds available? Waiting for a solution

Apr 27 '23 17:04 ezesculli

following

Apr 27 '23 18:04 stuartbgreen

@hwchase17 can you provide at least a workaround for this?

May 03 '23 04:05 ezesculli

__call__ now uses generate which now has an on_llm_end call in it. so should be resolved! if anyone is still having issues let me know

May 18 '23 21:05 dev2049

Is it possible to tap into the raw response that is returned from OpenAI when you are using the ChatOpenAI module?

May 28 '23 01:05 aligajani

if anyone is still having issues let me know

@dev2049 I just updated to langchain 0.0.190 and on_llm_end is getting called now. However, token_usage is always just an empty dict and it seems impossible to track actual usage. Should I file a new issue for that?

class AsyncUsageTrackingCallbackHandler(AsyncCallbackHandler):
    def always_verbose(self) -> bool:
        return True

    async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        print('on_llm_end', response)

Result:

on_llm_end generations=[[ChatGeneration(text='Great! What type of car do you currently own or plan to buy?', generation_info=None, message=AIMessage(content='Great! What type of car do you currently own or plan to buy?', additional_kwargs={}, example=False))]] llm_output={'token_usage': {}, 'model_name': 'gpt-4'}

Jun 05 '23 16:06 pencil

@pencil any updates, did you find anything?

Jul 13 '23 07:07 rishhavv

I looked into the langchain code but unfortunately wasn't able to pinpoint the issue in a jiffy given the complexity of its inner workings. Instead, I ended up implementing a workaround that tracks token usage via tiktoken:

from typing import Any, Dict, List

from langchain.callbacks.base import AsyncCallbackHandler
from langchain.schema import LLMResult
from tiktoken import encoding_for_model


class AsyncUsageTrackingCallbackHandler(AsyncCallbackHandler):
    def __init__(self, model_name: str) -> None:
        self.model_name = model_name
        self.tiktoken = encoding_for_model(model_name)
        self.tokens_sent = 0

    def always_verbose(self) -> bool:
        return True

    # TODO: The following is all we would need IN THEORY, however it doesn't
    # work because the token usage is not tracked correctly, see
    # https://github.com/hwchase17/langchain/pull/1924#issuecomment-1577142911
    # def on_llm_end(self, response: LLMResult, **kwargs) -> None:
    #     track_tokens_from_llm_result(self.model_name, response)
    #
    # Instead, we manually track the tokens we send and receive using tiktoken
    async def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], *args, **kwargs) -> Any:
        self.tokens_sent += sum([len(self.tiktoken.encode(prompt)) for prompt in prompts])

    async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        tokens_received = sum(
            [len(self.tiktoken.encode(g.message.content)) for generations in response.generations for g in generations]
        )
        print(f'[{self.model_name}] tokens sent: {self.tokens_sent}, tokens received: {tokens_received}')

Jul 13 '23 17:07 pencil

__call__ now uses generate which now has an on_llm_end call in it. so should be resolved! if anyone is still having issues let me know

@dev2049 There seems to be a problem when streaming=True, all the stats are 0...

Issue described here: https://github.com/langchain-ai/langchain/issues/12339

Nov 01 '23 19:11 rsarvis

This callback can be used by both OpenAI and ChatOpenAI models:

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.callbacks.openai_info import get_openai_token_cost_for_model
from langchain.schema import LLMResult
from tiktoken import encoding_for_model
from langchain_core.messages import BaseMessage

class TokenUsageTrackingCallbackHandler(StreamingStdOutCallbackHandler):
    def __init__(self, model_name: str) -> None:
        self.model_name = model_name
        self.tiktoken = encoding_for_model(model_name)
        self.tokens_sent = 0

    def always_verbose(self) -> bool:
        return True

    def on_llm_start(
            self, 
            serialized: Dict[str, Any], 
            prompts: List[str], 
            *args, 
            **kwargs
        ) -> Any:
        self.tokens_sent += sum([len(self.tiktoken.encode(prompt)) for prompt in prompts])

    def on_chat_model_start(
            self,
            serialized: Dict[str, Any],
            messages: List[List[BaseMessage]],
            **kwargs: Any,
        ) -> None:
        self.tokens_sent += sum([len(self.tiktoken.encode(prompt[0].content)) for prompt in messages])

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        tokens_received = sum(
            [len(self.tiktoken.encode(g.text)) for generations in response.generations for g in generations]
        )
        input_token_cost = get_openai_token_cost_for_model(model_name=self.model_name, num_tokens=self.tokens_sent)
        output_token_cost = get_openai_token_cost_for_model(model_name=self.model_name, num_tokens=tokens_received, is_completion=True)
        total_cost = input_token_cost + output_token_cost
        print(f'\n[{self.model_name}]\n\tTokens sent: {self.tokens_sent}\n\tTokens received: {tokens_received}\nTotal Cost (USD): ${total_cost}')

Above callback handler can be called as below:

ChatOpenAI(model_name='gpt-3.5-turbo-1106', temperature=0.3, streaming=True, callbacks=[TokenUsageTrackingCallbackHandler(model_name='gpt-3.5-turbo-1106')])

Dec 23 '23 13:12 meet1919

@meet1919

self.tokens_sent += sum([len(self.tiktoken.encode(prompt[0].content)) for prompt in messages]) in your code only count the first item in messages, to get all tokens from system messages and human messages :

    def on_chat_model_start(
            self,
            serialized: Dict[str, Any],
            messages: List[List[BaseMessage]],
            **kwargs: Any,
        ) -> None:
        token_count_list = []
        [[token_count_list.append(len(self.tiktoken.encode(m.content))) for m in prompt] for prompt in messages]
        self.tokens_sent += sum(token_count_list)

May 23 '24 23:05 shuanglovesdata

langchain langchain copied to clipboard

Feat: adding on_llm_end callback to ChatOpenAI

langchain
langchain copied to clipboard