langchain
langchain copied to clipboard
get_openai_callback dosen't work with streaming = True
System Info
0.0.166
Who can help?
@agola11
Information
- [ ] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [X] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [X] Callbacks/Tracing
- [X] Async
Reproduction
from langchain.prompts import PromptTemplate from langchain.chains import LLMChain from langchain.chat_models import ChatOpenAI from langchain.callbacks import get_openai_callback
question = "What is the answer of the meaning of life?"
prompt = PromptTemplate(
input_variables=["input"],
template="{input}",
)
llm = ChatOpenAI(temperature=0.7, max_tokens=2000, streaming=True)
chain = LLMChain(llm=llm, prompt=prompt)
with get_openai_callback() as cb:
print(chain.run(question))
print("\n\n")
print(cb)
result
As an AI language model, I do not have a personal belief system or opinion, and therefore, I do not have an answer to this question. The meaning of life is a philosophical and subjective topic that varies from person to person. It is up to individuals to find their own purpose and meaning in life.
Tokens Used: 0
Prompt Tokens: 0
Completion Tokens: 0
Successful Requests: 1
Total Cost (USD): $0.0
when set streaming = False, it works.
Expected behavior
should return with token usage info with streaming = True or False
related issue https://github.com/hwchase17/langchain/issues/3114
Hi,
In order to solve this bug I created my own async and cost calculator 'handler'(require tiktoken dependency):
from langchain.callbacks.base import AsyncCallbackHandler
from langchain.schema import LLMResult
from typing import Any, Dict, List
import tiktoken
MODEL_COST_PER_1K_TOKENS = {
"gpt-4": 0.03,
"gpt-4-0314": 0.03,
"gpt-4-completion": 0.06,
"gpt-4-0314-completion": 0.06,
"gpt-4-32k": 0.06,
"gpt-4-32k-0314": 0.06,
"gpt-4-32k-completion": 0.12,
"gpt-4-32k-0314-completion": 0.12,
"gpt-3.5-turbo": 0.002,
"gpt-3.5-turbo-0301": 0.002,
"text-ada-001": 0.0004,
"ada": 0.0004,
"text-babbage-001": 0.0005,
"babbage": 0.0005,
"text-curie-001": 0.002,
"curie": 0.002,
"text-davinci-003": 0.02,
"text-davinci-002": 0.02,
"code-davinci-002": 0.02,
}
class TokenCostProcess:
total_tokens: int = 0
prompt_tokens: int = 0
completion_tokens: int = 0
successful_requests: int = 0
def sum_prompt_tokens( self, tokens: int ):
self.prompt_tokens = self.prompt_tokens + tokens
self.total_tokens = self.total_tokens + tokens
def sum_completion_tokens( self, tokens: int ):
self.completion_tokens = self.completion_tokens + tokens
self.total_tokens = self.total_tokens + tokens
def sum_successful_requests( self, requests: int ):
self.successful_requests = self.successful_requests + requests
def get_openai_total_cost_for_model( self, model: str ) -> float:
return MODEL_COST_PER_1K_TOKENS[model] * self.total_tokens / 1000
def get_cost_summary(self, model:str) -> str:
cost = self.get_openai_total_cost_for_model(model)
return (
f"Tokens Used: {self.total_tokens}\n"
f"\tPrompt Tokens: {self.prompt_tokens}\n"
f"\tCompletion Tokens: {self.completion_tokens}\n"
f"Successful Requests: {self.successful_requests}\n"
f"Total Cost (USD): {cost}"
)
class CostCalcAsyncHandler(AsyncCallbackHandler):
model: str = ""
socketprint = None
websocketaction: str = "appendtext"
token_cost_process: TokenCostProcess
def __init__( self, model, token_cost_process ):
self.model = model
self.token_cost_process = token_cost_process
def on_llm_start( self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> None:
encoding = tiktoken.encoding_for_model( self.model )
if self.tocken_cost_process == None: return
for prompt in prompts:
self.tocken_cost_process.sum_prompt_tokens( len(encoding.encode(prompt)) )
async def on_llm_new_token(self, token: str, **kwargs) -> None:
print( token )
self.tocken_cost_process.sum_completion_tokens( 1 )
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
self.tocken_cost_process.sum_successful_requests( 1 )
I use this way:
token_cost_process = TokenCostProcess()
chat = ChatOpenAI(
streaming=True,
callbacks=[ CostCalcAsyncHandler( "gpt-3.5-turbo", token_cost_process ) ],
temperature= 0
model_name = "gpt-3.5-turbo"
)
...
print( token_cost_process.get_cost_summary( "gpt-3.5-turbo" ) )
I hope this helps anyone.
Thanks,
Do you know if there's any progress on that?
I have no idea Michal! I know noone from the project, my contribution it was only as an user
@hwchase17 @dev2049 The solution by @acoronadoc for cost tracking here seems pretty solid for streaming chat. Do you agree? We could make a PR for it.
wondering about the same (and thanks for the contribution!)
LangSmith records streaming tokens. Any Progress?
A variation inspired by @acoronadoc
from typing import Any, Dict, List
import tiktoken
from langchain.callbacks.base import AsyncCallbackHandler
from langchain.callbacks.openai_info import (
MODEL_COST_PER_1K_TOKENS,
get_openai_token_cost_for_model,
)
from langchain.schema import LLMResult
class Cost:
def __init__(
self, total_cost=0, total_tokens=0, prompt_tokens=0, completion_tokens=0
):
self.total_cost = total_cost
self.total_tokens = total_tokens
self.prompt_tokens = prompt_tokens
self.completion_tokens = completion_tokens
def __str__(self):
return (
f"Tokens Used: {self.total_tokens}\n"
f"\tPrompt Tokens: {self.prompt_tokens}\n"
f"\tCompletion Tokens: {self.completion_tokens}\n"
f"Total Cost (USD): ${self.total_cost}"
)
def add_prompt_tokens(self, prompt_tokens):
self.prompt_tokens += prompt_tokens
self.total_tokens += prompt_tokens
def add_completion_tokens(self, completion_tokens):
self.completion_tokens += completion_tokens
self.total_tokens += completion_tokens
class CostCalcCallbackHandler(AsyncCallbackHandler):
def __init__(self, model_name, cost, *args, **kwargs):
self.model_name = model_name
self.encoding = tiktoken.encoding_for_model(model_name)
self.cost = cost
super().__init__(*args, **kwargs)
async def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
for prompt in prompts:
self.cost.add_prompt_tokens(len(self.encoding.encode(prompt)))
async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
for generation_list in response.generations:
for generation in generation_list:
self.cost.add_completion_tokens(
len(self.encoding.encode(generation.text))
)
if self.model_name in MODEL_COST_PER_1K_TOKENS:
prompt_cost = get_openai_token_cost_for_model(
self.model_name, self.cost.prompt_tokens
)
completion_cost = get_openai_token_cost_for_model(
self.model_name, self.cost.completion_tokens, is_completion=True
)
self.cost.total_cost = prompt_cost + completion_cost
Example usage:
import asyncio
from langchain.chat_models import ChatOpenAI
model_name = "gpt-3.5-turbo"
cost = Cost()
chat = ChatOpenAI(
streaming=True,
callbacks=[CostCalcCallbackHandler(model_name, cost)],
temperature=0,
model_name=model_name,
)
asyncio.run(chat.apredict(text="What's 2+2?"))
print(cost)
# Tokens Used: 17
# Prompt Tokens: 9
# Completion Tokens: 8
# Total Cost (USD): $2.95e-05
Hi, @shadowlinyf,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you raised regarding the get_openai_callback
function not working with streaming=True
has been confirmed and discussed by several users, including acoronadoc, pors, MichalBortkiewicz, and nick-solly. Potential solutions and variations have been proposed, but the issue remains unresolved.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to LangChain!
Hi, @shadowlinyf,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you raised regarding the
get_openai_callback
function not working withstreaming=True
has been confirmed and discussed by several users, including acoronadoc, pors, MichalBortkiewicz, and nick-solly. Potential solutions and variations have been proposed, but the issue remains unresolved.Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to LangChain!
I am still having this issue with streaming
openai_info
@nick-solly @acoronadoc Does this still work with the new ChatOpenAI (from langchain_openai import ChatOpenAI) ?
I'm using the above callback code/setup on an OpenAI Functions Agent and I am seeing a token count, but when I compare it to the Token Count on the LangSmith trace, I see a different count of the tokens. Do you know why??
In LangSmith i see: 1671 tokens
And in my Callback print i see: 1094 tokens
Any know where the discrepancy of the count is? / How to fix?
@hwchase17 Any thoughts on this? How does LangSmith go about calculating Token counts?
I have a solution by set stream_usage=True when you initiate ChatOpenAI instance, then wrap in get_openai_callback()
from langchain_openai import ChatOpenAI
from langchain_community.callbacks import get_openai_callback
llm = ChatOpenAI(temperature=0, stream_usage=True)
with get_openai_callback() as cb:
for chunk in chain.stream("What is Map Reduce"):
print(chunk)
print(cb.prompt_tokens)
print(cb.completion_tokens)