langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Feat: adding on_llm_end callback to ChatOpenAI

Open stephenleo opened this issue 1 year ago • 13 comments

Kick starting this PR to get some feedback on the design choices to get token usage working in ChatOpenAI

Here's what I found so far, pls let me know if I'm on the right track

  1. BaseChatModel's __call__ calls ChatOpenAI's _generate
  2. #1785 ensures token_usage is returned by _generate. However, there is no on_llm_end() callback to expose token_usage in the __call__
  3. Adding on_llm_callback() just prior to returning from __call__ solves the issue and successfully exposes token_usage but results in a whole bunch of linting errors since the response type is now ChatResult and not LLMResult that on_llm_callback is expecting.
  4. The PR currently implements the above point and has linting errors that I can go ahead and fix if this direction is valid. Or is there a better place to add the on_llm_callback() to expose token_usage for ChatOpenAI? Please let me know your thoughts.

Current master returns 0 for token_usage:

from langchain.chat_models import ChatOpenAI
from langchain.callbacks import get_openai_callback
from langchain.schema import HumanMessage

chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.8)

with get_openai_callback() as cb:
    response = chat([HumanMessage(content="Tell me a joke")])
    print(cb.total_tokens)

0

This PR fixes this issue, and the same code prints: 25

stephenleo avatar Mar 23 '23 12:03 stephenleo

@hwchase17 , could you pls help to take a look at this when you can?

stephenleo avatar Mar 28 '23 06:03 stephenleo

@hwchase17 , could you pls help to take a look at this when you can?

we'll want to fix in a different way. this callback is already on this class in a place, this likely means it in the wrong place. i can take a look

hwchase17 avatar Mar 28 '23 22:03 hwchase17

@hwchase17 , could you pls help to take a look at this when you can?

we'll want to fix in a different way. this callback is already on this class in a place, this likely means it in the wrong place. i can take a look

this is a bit more complicated of a fix... let me think on it a bit

hwchase17 avatar Mar 28 '23 22:03 hwchase17

@hwchase17 How about creating chat-specific callbacks, like on_chat_end?

fabioperez avatar Apr 10 '23 11:04 fabioperez

Following this..

oliverbj avatar Apr 11 '23 19:04 oliverbj

Following

Majiick avatar Apr 11 '23 22:04 Majiick

following this

rayli09 avatar Apr 14 '23 02:04 rayli09

Following

yuyuma avatar Apr 15 '23 22:04 yuyuma

following

achammah avatar Apr 21 '23 16:04 achammah

@hwchase17 hi there, was wondering if there's any plans on fixing this? or any workarounds?

rayli09 avatar Apr 22 '23 04:04 rayli09

@hwchase17, is any fix for this? or any workarounds available? Waiting for a solution

ezesculli avatar Apr 27 '23 17:04 ezesculli

following

stuartbgreen avatar Apr 27 '23 18:04 stuartbgreen

@hwchase17 can you provide at least a workaround for this?

ezesculli avatar May 03 '23 04:05 ezesculli

__call__ now uses generate which now has an on_llm_end call in it. so should be resolved! if anyone is still having issues let me know

dev2049 avatar May 18 '23 21:05 dev2049

Is it possible to tap into the raw response that is returned from OpenAI when you are using the ChatOpenAI module?

aligajani avatar May 28 '23 01:05 aligajani

if anyone is still having issues let me know

@dev2049 I just updated to langchain 0.0.190 and on_llm_end is getting called now. However, token_usage is always just an empty dict and it seems impossible to track actual usage. Should I file a new issue for that?

class AsyncUsageTrackingCallbackHandler(AsyncCallbackHandler):
    def always_verbose(self) -> bool:
        return True

    async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        print('on_llm_end', response)

Result:

on_llm_end generations=[[ChatGeneration(text='Great! What type of car do you currently own or plan to buy?', generation_info=None, message=AIMessage(content='Great! What type of car do you currently own or plan to buy?', additional_kwargs={}, example=False))]] llm_output={'token_usage': {}, 'model_name': 'gpt-4'}

pencil avatar Jun 05 '23 16:06 pencil

@pencil any updates, did you find anything?

rishhavv avatar Jul 13 '23 07:07 rishhavv

I looked into the langchain code but unfortunately wasn't able to pinpoint the issue in a jiffy given the complexity of its inner workings. Instead, I ended up implementing a workaround that tracks token usage via tiktoken:

from typing import Any, Dict, List

from langchain.callbacks.base import AsyncCallbackHandler
from langchain.schema import LLMResult
from tiktoken import encoding_for_model


class AsyncUsageTrackingCallbackHandler(AsyncCallbackHandler):
    def __init__(self, model_name: str) -> None:
        self.model_name = model_name
        self.tiktoken = encoding_for_model(model_name)
        self.tokens_sent = 0

    def always_verbose(self) -> bool:
        return True

    # TODO: The following is all we would need IN THEORY, however it doesn't
    # work because the token usage is not tracked correctly, see
    # https://github.com/hwchase17/langchain/pull/1924#issuecomment-1577142911
    # def on_llm_end(self, response: LLMResult, **kwargs) -> None:
    #     track_tokens_from_llm_result(self.model_name, response)
    #
    # Instead, we manually track the tokens we send and receive using tiktoken
    async def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], *args, **kwargs) -> Any:
        self.tokens_sent += sum([len(self.tiktoken.encode(prompt)) for prompt in prompts])

    async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        tokens_received = sum(
            [len(self.tiktoken.encode(g.message.content)) for generations in response.generations for g in generations]
        )
        print(f'[{self.model_name}] tokens sent: {self.tokens_sent}, tokens received: {tokens_received}')

pencil avatar Jul 13 '23 17:07 pencil

__call__ now uses generate which now has an on_llm_end call in it. so should be resolved! if anyone is still having issues let me know

@dev2049 There seems to be a problem when streaming=True, all the stats are 0...

Issue described here: https://github.com/langchain-ai/langchain/issues/12339

rsarvis avatar Nov 01 '23 19:11 rsarvis

This callback can be used by both OpenAI and ChatOpenAI models:

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.callbacks.openai_info import get_openai_token_cost_for_model
from langchain.schema import LLMResult
from tiktoken import encoding_for_model
from langchain_core.messages import BaseMessage

class TokenUsageTrackingCallbackHandler(StreamingStdOutCallbackHandler):
    def __init__(self, model_name: str) -> None:
        self.model_name = model_name
        self.tiktoken = encoding_for_model(model_name)
        self.tokens_sent = 0

    def always_verbose(self) -> bool:
        return True

    def on_llm_start(
            self, 
            serialized: Dict[str, Any], 
            prompts: List[str], 
            *args, 
            **kwargs
        ) -> Any:
        self.tokens_sent += sum([len(self.tiktoken.encode(prompt)) for prompt in prompts])

    def on_chat_model_start(
            self,
            serialized: Dict[str, Any],
            messages: List[List[BaseMessage]],
            **kwargs: Any,
        ) -> None:
        self.tokens_sent += sum([len(self.tiktoken.encode(prompt[0].content)) for prompt in messages])

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        tokens_received = sum(
            [len(self.tiktoken.encode(g.text)) for generations in response.generations for g in generations]
        )
        input_token_cost = get_openai_token_cost_for_model(model_name=self.model_name, num_tokens=self.tokens_sent)
        output_token_cost = get_openai_token_cost_for_model(model_name=self.model_name, num_tokens=tokens_received, is_completion=True)
        total_cost = input_token_cost + output_token_cost
        print(f'\n[{self.model_name}]\n\tTokens sent: {self.tokens_sent}\n\tTokens received: {tokens_received}\nTotal Cost (USD): ${total_cost}')

Above callback handler can be called as below:

ChatOpenAI(model_name='gpt-3.5-turbo-1106', temperature=0.3, streaming=True, callbacks=[TokenUsageTrackingCallbackHandler(model_name='gpt-3.5-turbo-1106')])

meet1919 avatar Dec 23 '23 13:12 meet1919

@meet1919

self.tokens_sent += sum([len(self.tiktoken.encode(prompt[0].content)) for prompt in messages]) in your code only count the first item in messages, to get all tokens from system messages and human messages :

    def on_chat_model_start(
            self,
            serialized: Dict[str, Any],
            messages: List[List[BaseMessage]],
            **kwargs: Any,
        ) -> None:
        token_count_list = []
        [[token_count_list.append(len(self.tiktoken.encode(m.content))) for m in prompt] for prompt in messages]
        self.tokens_sent += sum(token_count_list)

shuanglovesdata avatar May 23 '24 23:05 shuanglovesdata