langchain
langchain copied to clipboard
Feat: adding on_llm_end callback to ChatOpenAI
Kick starting this PR to get some feedback on the design choices to get token usage working in ChatOpenAI
Here's what I found so far, pls let me know if I'm on the right track
- BaseChatModel's
__call__
calls ChatOpenAI's_generate
- #1785 ensures token_usage is returned by
_generate
. However, there is noon_llm_end()
callback to expose token_usage in the__call__
- Adding
on_llm_callback()
just prior to returning from__call__
solves the issue and successfully exposestoken_usage
but results in a whole bunch of linting errors since the response type is now ChatResult and not LLMResult thaton_llm_callback
is expecting. - The PR currently implements the above point and has linting errors that I can go ahead and fix if this direction is valid. Or is there a better place to add the
on_llm_callback()
to expose token_usage for ChatOpenAI? Please let me know your thoughts.
Current master returns 0 for token_usage:
from langchain.chat_models import ChatOpenAI
from langchain.callbacks import get_openai_callback
from langchain.schema import HumanMessage
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.8)
with get_openai_callback() as cb:
response = chat([HumanMessage(content="Tell me a joke")])
print(cb.total_tokens)
0
This PR fixes this issue, and the same code prints: 25
@hwchase17 , could you pls help to take a look at this when you can?
@hwchase17 , could you pls help to take a look at this when you can?
we'll want to fix in a different way. this callback is already on this class in a place, this likely means it in the wrong place. i can take a look
@hwchase17 , could you pls help to take a look at this when you can?
we'll want to fix in a different way. this callback is already on this class in a place, this likely means it in the wrong place. i can take a look
this is a bit more complicated of a fix... let me think on it a bit
@hwchase17 How about creating chat-specific callbacks, like on_chat_end
?
Following this..
Following
following this
Following
following
@hwchase17 hi there, was wondering if there's any plans on fixing this? or any workarounds?
@hwchase17, is any fix for this? or any workarounds available? Waiting for a solution
following
@hwchase17 can you provide at least a workaround for this?
__call__
now uses generate
which now has an on_llm_end
call in it. so should be resolved! if anyone is still having issues let me know
Is it possible to tap into the raw response that is returned from OpenAI when you are using the ChatOpenAI module?
if anyone is still having issues let me know
@dev2049 I just updated to langchain 0.0.190 and on_llm_end
is getting called now. However, token_usage
is always just an empty dict and it seems impossible to track actual usage. Should I file a new issue for that?
class AsyncUsageTrackingCallbackHandler(AsyncCallbackHandler):
def always_verbose(self) -> bool:
return True
async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
print('on_llm_end', response)
Result:
on_llm_end generations=[[ChatGeneration(text='Great! What type of car do you currently own or plan to buy?', generation_info=None, message=AIMessage(content='Great! What type of car do you currently own or plan to buy?', additional_kwargs={}, example=False))]] llm_output={'token_usage': {}, 'model_name': 'gpt-4'}
@pencil any updates, did you find anything?
I looked into the langchain code but unfortunately wasn't able to pinpoint the issue in a jiffy given the complexity of its inner workings. Instead, I ended up implementing a workaround that tracks token usage via tiktoken
:
from typing import Any, Dict, List
from langchain.callbacks.base import AsyncCallbackHandler
from langchain.schema import LLMResult
from tiktoken import encoding_for_model
class AsyncUsageTrackingCallbackHandler(AsyncCallbackHandler):
def __init__(self, model_name: str) -> None:
self.model_name = model_name
self.tiktoken = encoding_for_model(model_name)
self.tokens_sent = 0
def always_verbose(self) -> bool:
return True
# TODO: The following is all we would need IN THEORY, however it doesn't
# work because the token usage is not tracked correctly, see
# https://github.com/hwchase17/langchain/pull/1924#issuecomment-1577142911
# def on_llm_end(self, response: LLMResult, **kwargs) -> None:
# track_tokens_from_llm_result(self.model_name, response)
#
# Instead, we manually track the tokens we send and receive using tiktoken
async def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], *args, **kwargs) -> Any:
self.tokens_sent += sum([len(self.tiktoken.encode(prompt)) for prompt in prompts])
async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
tokens_received = sum(
[len(self.tiktoken.encode(g.message.content)) for generations in response.generations for g in generations]
)
print(f'[{self.model_name}] tokens sent: {self.tokens_sent}, tokens received: {tokens_received}')
__call__
now usesgenerate
which now has anon_llm_end
call in it. so should be resolved! if anyone is still having issues let me know
@dev2049 There seems to be a problem when streaming=True
, all the stats are 0...
Issue described here: https://github.com/langchain-ai/langchain/issues/12339
This callback can be used by both OpenAI and ChatOpenAI models:
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.callbacks.openai_info import get_openai_token_cost_for_model
from langchain.schema import LLMResult
from tiktoken import encoding_for_model
from langchain_core.messages import BaseMessage
class TokenUsageTrackingCallbackHandler(StreamingStdOutCallbackHandler):
def __init__(self, model_name: str) -> None:
self.model_name = model_name
self.tiktoken = encoding_for_model(model_name)
self.tokens_sent = 0
def always_verbose(self) -> bool:
return True
def on_llm_start(
self,
serialized: Dict[str, Any],
prompts: List[str],
*args,
**kwargs
) -> Any:
self.tokens_sent += sum([len(self.tiktoken.encode(prompt)) for prompt in prompts])
def on_chat_model_start(
self,
serialized: Dict[str, Any],
messages: List[List[BaseMessage]],
**kwargs: Any,
) -> None:
self.tokens_sent += sum([len(self.tiktoken.encode(prompt[0].content)) for prompt in messages])
def on_llm_end(self, response: LLMResult, **kwargs) -> None:
tokens_received = sum(
[len(self.tiktoken.encode(g.text)) for generations in response.generations for g in generations]
)
input_token_cost = get_openai_token_cost_for_model(model_name=self.model_name, num_tokens=self.tokens_sent)
output_token_cost = get_openai_token_cost_for_model(model_name=self.model_name, num_tokens=tokens_received, is_completion=True)
total_cost = input_token_cost + output_token_cost
print(f'\n[{self.model_name}]\n\tTokens sent: {self.tokens_sent}\n\tTokens received: {tokens_received}\nTotal Cost (USD): ${total_cost}')
Above callback handler can be called as below:
ChatOpenAI(model_name='gpt-3.5-turbo-1106', temperature=0.3, streaming=True, callbacks=[TokenUsageTrackingCallbackHandler(model_name='gpt-3.5-turbo-1106')])
@meet1919
self.tokens_sent += sum([len(self.tiktoken.encode(prompt[0].content)) for prompt in messages])
in your code only count the first item in messages, to get all tokens from system messages and human messages :
def on_chat_model_start(
self,
serialized: Dict[str, Any],
messages: List[List[BaseMessage]],
**kwargs: Any,
) -> None:
token_count_list = []
[[token_count_list.append(len(self.tiktoken.encode(m.content))) for m in prompt] for prompt in messages]
self.tokens_sent += sum(token_count_list)