langchainjs
langchainjs copied to clipboard
How to get token count, callbacks work but just for ChatOpenAI but not for RetrievalQAChain
I am trying to get a token count for a process, I am passing callbacks to the class initialization like this
let finalTokens = 0
const initPayload = {
openAIApiKey: process.env['OPEN_AI_KEY'],
temperature: 1.5,
callbacks: [
{
handleLLMEnd: (val) => {
try {
const tokens = val.llmOutput.tokenUsage.totalTokens
finalTokens += tokens
console.log({tokens, finalTokens})
} catch {
console.log(val.generations[0])
}
},
},
],
};
However all of the calls from a RetrievalQAChain end up in the catch portion of that try-catch block as 'tokenUsage' does not exist for those calls. Can someone point me in the right direction?
I'm having a similar issue.
When I define gpt-3.5-turbo
as the model for OpenAI
construct, llmOutput
is missing tokenUsage
object.
Using the same construct, but not defining the model returns token usage
as part of llmOutput
Not working:
const model = new OpenAI({
openAIApiKey: openAISecret,
modelName: 'gpt-3.5-turbo',
callbacks: [
{
handleLLMEnd: async (output: LLMResult) => {
logger.info('output', { output })
logger.info('tokenUsage', { tokenUsage: output.llmOutput })
// tokenUsage: UNDEFINED
},
},
],
})
Working:
const model = new OpenAI({
openAIApiKey: openAISecret,
callbacks: [
{
handleLLMEnd: async (output: LLMResult) => {
logger.info('output', { output })
logger.info('tokenUsage', { tokenUsage: output.llmOutput })
// tokenUsage: found
},
},
],
})
I added own issue for this
Yeah, the problem is that not defining the model uses davinci-003 which costs 0.02 per token vs the 3.5 turbo, which is 0.002
On Fri, Apr 28, 2023 at 4:56 AM, Kasper Hämäläinen < @.*** > wrote:
I'm having a similar issue. When I define gpt-3.5-turbo as the model for OpenAI construct, llmOutput is missing tokenUsage object.
Using the same construct, but not defining the model returns token usage as part of llmOutput
Not working:
const model = new OpenAI({ openAIApiKey: openAISecret,
modelName: 'gpt-3.5-turbo', callbacks: [ { handleLLMEnd: async (output: LLMResult) => { logger. info ( http://logger.info/ ) ('output', { output })
logger. info ( http://logger.info/ ) ('tokenUsage', { tokenUsage: output.llmOutput }) // tokenUsage: UNDEFINED }, }, ], })
Working:
const model = new OpenAI({ openAIApiKey: openAISecret,
callbacks: [ { handleLLMEnd: async (output: LLMResult) => {
logger. info ( http://logger.info/ ) ('output', { output }) logger. info ( http://logger.info/ ) ('tokenUsage', { tokenUsage: output.llmOutput }) // tokenUsage: found }, },
], })
— Reply to this email directly, view it on GitHub ( https://github.com/hwchase17/langchainjs/issues/965#issuecomment-1527218240 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ANK7G7HOXLZL5T2X7RZMIHDXDOA4FANCNFSM6AAAAAAXIXQAZY ). You are receiving this because you authored the thread. Message ID: <hwchase17/langchainjs/issues/965/1527218240 @ github. com>
Use it this way:
import { ChatOpenAI } from "langchain/chat_models/openai";
const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo" });
I found that importing it like that returns the tokenUsage in handleLLMEnd handler
still waiting for the solution on this
tested with azure API:
curl -X POST -H 'Content-type: application/json' -H 'User-Agent: OpenAI/NodeJS/3.3.0' -H 'api-key: xxxxx' -H --data '{"model":"gpt-3.5-turbo","temperature":0.7,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"!"}]}' https://{azureApiInstanceName}.openai.azure.com/openai/deployments/{azureOpenAIApiDeploymentName}/chat/completi ns\?api-version=2023-05-15
stream=false gets usage data, works as expected. stream=true result "usage":null
I am also running into this. There doesn’t seem to be anyway to grab cost or at least token usage when calling chains or agents. Having and output after a chain or agent finishes with total usage would be great
Same issue, tokenUsage is not returned when using OpenAI() model.
same problem here, when streaming is set to true it doesn't return token usage. Any idea for workaround?
Hello everyone,
I recently started working on a stealth startup, and I'm using langchainjs as a core component of our tech stack. I must say, I've been impressed with the work done here! Thank you so much for all your hard work on this project, and for providing tools that startups like mine can rely on!
While integrating the library, I noticed the problem of lack of token statistics when using ChatOpenAi in streaming mode. I did some digging in the code and I believe I found the source of the problem.
In _generate
method of ChatOpenAi class it's expected that data.usage
contains completion_tokens
, prompt_tokens
, and total_tokens
fields which are later copied to tokenUsage
. When ChatOpenAi is instantiated with streaming: false
, response.data
field from the call to OpenAIApi.createChatCompletion
is returned as data. response.data
is an instance of Completion
which indeed contains usage
with the required fields. That's why token usage works with streaming: false
. When ChatOpenAi is instantiated with streaming: true
, response
object is not created by the OpenAIApi but in the code of _generate
instead. This branch of the implementation doesn't set the usage
field at all.
I believe that adding the required fields, using the .getNumTokensFromMessages(...) might address this.
// EDIT 2023-08-17 8:50 CET
I did some more digging. It's not as simple as I thought. Using .getNumTokensFromMessages(...) would introduce two more calls to OpenAI API. Using it to get tokenUsage
for each call with streaming: true
would introduce additional cost for all users of the library even if they don't care about token usage.
It turns out that the original langchain implementation has the same problem. When streaming=True
, The ChatResult instance is created without llm_output
field which contains token usage stats.
Both implementations are actually correct as the source of problem lies within the OpenAI API. When streaming is enabled the token usage statistics are not being sent to the client at all. What is being sent is a stream of chat.completion.chunk
objects that don't contain any token information.
Did anyone find solution for this?
I think the reason is that the GPT-3.5-turbo model can only be used for Chat models.
curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ...." \
-d '{
"model": "gpt-3.5-turbo",
"prompt": "Say this is a test",
"max_tokens": 7,
"temperature": 0
}'
{
"error": {
"message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
"type": "invalid_request_error",
"param": "model",
"code": null
}
}'
I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue.
// old
// const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
// new
const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
const prompt = PromptTemplate.fromTemplate(
"What is a good name for a company that makes {product}?"
);
const chain = new LLMChain({ llm: model, prompt });
const resA2 = await chain.run("colorful socks", {callbacks: [{
handleLLMEnd: (output, runId, parentRunId?, tags?) => {
const { completionTokens, promptTokens, totalTokens } =
output.llmOutput?.tokenUsage;
console.log(completionTokens ?? 0);
console.log(promptTokens ?? 0);
console.log(totalTokens ?? 0);
// "llmOutput": {
// "tokenUsage": {
// "completionTokens": 3,
// "promptTokens": 20,
// "totalTokens": 23
// }
// }
},
}]});
I managed to count tokens for streaming: true
by using callbacks:
const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true });
const chain = new LLMChain({ llm: model, prompt })
const { text: assistantResponse } = await chain.call({
query: query,
}, {
callbacks: [
{
handleChatModelStart: async (llm, messages) => {
const tokenCount = tokenCounter(messages[0][0].content);
// The prompt is available here: messages[0][0].content
},
handleChainEnd: async (outputs) => {
const { text: outputText } = outputs;
// outputText is the response from the chat call
const tokenCount = tokenCounter(outputText);
}
}
]
}
);
I managed to count tokens for
streaming: true
by using callbacks:const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true }); const chain = new LLMChain({ llm: model, prompt }) const { text: assistantResponse } = await chain.call({ query: query, }, { callbacks: [ { handleChatModelStart: async (llm, messages) => { const tokenCount = tokenCounter(messages[0][0].content); // The prompt is available here: messages[0][0].content }, handleChainEnd: async (outputs) => { const { text: outputText } = outputs; // outputText is the response from the chat call const tokenCount = tokenCounter(outputText); } } ] } );
Doesn't that only account for the initial prompt and the final response (not any intermediate calls for functions, etc)?
I think the reason is that the GPT-3.5-turbo model can only be used for Chat models.
curl https://api.openai.com/v1/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ...." \ -d '{ "model": "gpt-3.5-turbo", "prompt": "Say this is a test", "max_tokens": 7, "temperature": 0 }' { "error": { "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?", "type": "invalid_request_error", "param": "model", "code": null } }'
I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue.
// old // const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" }); // new const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" }); const prompt = PromptTemplate.fromTemplate( "What is a good name for a company that makes {product}?" ); const chain = new LLMChain({ llm: model, prompt }); const resA2 = await chain.run("colorful socks", {callbacks: [{ handleLLMEnd: (output, runId, parentRunId?, tags?) => { const { completionTokens, promptTokens, totalTokens } = output.llmOutput?.tokenUsage; console.log(completionTokens ?? 0); console.log(promptTokens ?? 0); console.log(totalTokens ?? 0); // "llmOutput": { // "tokenUsage": { // "completionTokens": 3, // "promptTokens": 20, // "totalTokens": 23 // } // } }, }]});
This solved the issue for me too!
Any news on this?
I still get an empty object for the token usage with streaming mode enabled.
Hi Thread, I am using typescript sdk of langchain. I am still receiving 0 token count. Can you please here ?
@jacoblee93 Any help here ?
@hwchase17 @nfcampos @bracesproul @sullivan-sean Any help here ?
Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back
Package Version: 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts';
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";
// Define the tools the agent will have access to.
const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];
const llm = new ChatOpenAI({
modelName: "gpt-4-turbo",
temperature: 0.15,
maxRetries: 3,
timeout: 30000,
callbacks: [
{
handleLLMEnd(output) {
console.log(output)
output.generations.map(generation => {
generation.map(g => {
// console.log(g.message.response_metadata.tokenUsage)
})
})
},
}
]
});
const prompt = ChatPromptTemplate.fromMessages([
[
'system',
`You are a virtual agent`,
],
new MessagesPlaceholder({
variableName: 'chat_history',
optional: true,
}),
['user', '{input}'],
new MessagesPlaceholder({
variableName: 'agent_scratchpad',
optional: false,
}),
]);
const agent = await createOpenAIToolsAgent({
llm,
tools,
prompt,
});
const agentExecutor = new AgentExecutor({
agent,
tools,
});
const result = await agentExecutor.invoke({
input: "what is LangChain?, describe it in a sentence",
});
console.log(result);
The output
{
generations: [
[
ChatGenerationChunk {
text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
generationInfo: {
prompt: 0,
completion: 0,
finish_reason: 'stop'
},
message: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
additional_kwargs: {},
response_metadata: {
prompt: 0,
completion: 0,
finish_reason: 'stop'
},
tool_call_chunks: [],
tool_calls: [],
invalid_tool_calls: []
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
name: undefined,
additional_kwargs: {},
response_metadata: {
prompt: 0,
completion: 0,
finish_reason: 'stop'
},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: []
},
__proto__: {
constructor: ƒ ChatGenerationChunk(),
concat: ƒ concat()
}
}
]
]
}
I just wrote my own using the OpenAI api. The implementation is not that complex and you have more control, and don't have to wait over a year for someone else to fix.
-- Raphael Castro TED Team Monster Reservations Group C: 843.855.7133 www.monsterrg.com ( http://www.monsterrg.com/ )
On Tue, Apr 30, 2024 at 6:08 AM, Clovis Rodriguez < @.*** > wrote:
Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back
Package Version : 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122
import { ChatOpenAI } from @./openai"; import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts'; import { TavilySearchResults } from @./community/tools/tavily_search"; import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";
// Define the tools the agent will have access to. const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];
const llm = new ChatOpenAI({ modelName: "gpt-4-turbo", temperature: 0.15,
maxRetries: 3, timeout: 30000, callbacks: [ {
handleLLMEnd(output) { console.log(output)
output.generations.map(generation => { generation.map(g => {
// console.log(g.message.response_metadata.tokenUsage) })
}) }, } ] });
const prompt = ChatPromptTemplate.fromMessages([ [ 'system',
You are a virtual agent
, ], new MessagesPlaceholder({variableName: 'chat_history', optional: true, }),
['user', '{input}'], new MessagesPlaceholder({
variableName: 'agent_scratchpad', optional: false, }), ]);
const agent = await createOpenAIToolsAgent({ llm, tools,
prompt, });
const agentExecutor = new AgentExecutor({ agent, tools, });
const result = await agentExecutor.invoke({ input: "what is LangChain?, describe it in a sentence", });
console.log(result);
The output
{ generations: [ [ ChatGenerationChunk { text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', generationInfo: {
prompt: 0, completion: 0, finish_reason: 'stop'
}, message: AIMessageChunk { lc_serializable: true,
lc_kwargs: { content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
additional_kwargs: {}, response_metadata: {
prompt: 0, completion: 0, finish_reason: 'stop' }, tool_call_chunks: [],
tool_calls: [], invalid_tool_calls: [] },
lc_namespace: [ 'langchain_core', 'messages' ], content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', name: undefined,
additional_kwargs: {}, response_metadata: { prompt: 0, completion: 0, finish_reason: 'stop'
}, tool_calls: [], invalid_tool_calls: [],
tool_call_chunks: [] }, proto: {
constructor: ƒ ChatGenerationChunk(), concat: ƒ concat()
} } ] ] }
— Reply to this email directly, view it on GitHub ( https://github.com/langchain-ai/langchainjs/issues/965#issuecomment-2084895840 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ANK7G7CA6O7Y2M3PN67TZ73Y75URDAVCNFSM6AAAAAAXIXQAZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBUHA4TKOBUGA ). You are receiving this because you authored the thread. Message ID: <langchain-ai/langchainjs/issues/965/2084895840 @ github. com>
Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back
Package Version: 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122
import { ChatOpenAI } from "@langchain/openai"; import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts'; import { TavilySearchResults } from "@langchain/community/tools/tavily_search"; import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents"; // Define the tools the agent will have access to. const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })]; const llm = new ChatOpenAI({ modelName: "gpt-4-turbo", temperature: 0.15, maxRetries: 3, timeout: 30000, callbacks: [ { handleLLMEnd(output) { console.log(output) output.generations.map(generation => { generation.map(g => { // console.log(g.message.response_metadata.tokenUsage) }) }) }, } ] }); const prompt = ChatPromptTemplate.fromMessages([ [ 'system', `You are a virtual agent`, ], new MessagesPlaceholder({ variableName: 'chat_history', optional: true, }), ['user', '{input}'], new MessagesPlaceholder({ variableName: 'agent_scratchpad', optional: false, }), ]); const agent = await createOpenAIToolsAgent({ llm, tools, prompt, }); const agentExecutor = new AgentExecutor({ agent, tools, }); const result = await agentExecutor.invoke({ input: "what is LangChain?, describe it in a sentence", }); console.log(result);
The output
{ generations: [ [ ChatGenerationChunk { text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', generationInfo: { prompt: 0, completion: 0, finish_reason: 'stop' }, message: AIMessageChunk { lc_serializable: true, lc_kwargs: { content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', additional_kwargs: {}, response_metadata: { prompt: 0, completion: 0, finish_reason: 'stop' }, tool_call_chunks: [], tool_calls: [], invalid_tool_calls: [] }, lc_namespace: [ 'langchain_core', 'messages' ], content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', name: undefined, additional_kwargs: {}, response_metadata: { prompt: 0, completion: 0, finish_reason: 'stop' }, tool_calls: [], invalid_tool_calls: [], tool_call_chunks: [] }, __proto__: { constructor: ƒ ChatGenerationChunk(), concat: ƒ concat() } } ] ] }
same issue
@bracesproul Brace, I think the 0 token issue is a very serious problem, any chance you can look into it?
Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful:
import { encodingForModel } from 'js-tiktoken';
export class TokenCounter {
private _totalTokens: number = 0;
private _promptTokens: number = 0;
private _completionTokens: number = 0;
private _enc: any;
constructor(model) {
this._enc = encodingForModel(model);
}
encodeAndCountTokens(text: string): number {
return this._enc.encode(text).length;
}
handleLLMEnd(result: any) {
result.generations.forEach((generation: any) => {
const content = generation[0]?.message?.text || '';
const calls = generation[0]?.message?.additional_kwargs || '';
console.log('Calls & Content:', {
calls,
content,
});
const output = JSON.stringify(calls, null, 2);
const tokens = this.encodeAndCountTokens(content + output);
this._completionTokens += tokens;
});
console.log('Tokens for this LLMEnd:', this._completionTokens);
}
handleChatModelStart(_, args) {
args[0].forEach((arg) => {
const content = arg?.content || '';
const calls = arg?.additional_kwargs || '';
const tokens = this.encodeAndCountTokens(
content + JSON.stringify(calls, null, 2),
);
this._promptTokens += tokens;
console.log('content:', content, calls);
});
console.log('Tokens for this ChatModelStart:', this._promptTokens);
}
modelTracer() {
return {
handleChatModelStart: this.handleChatModelStart.bind(this),
handleLLMEnd: this.handleLLMEnd.bind(this),
};
}
sumTokens() {
this._totalTokens = this._promptTokens + this._completionTokens;
console.log('Total Tokens:', this._totalTokens);
}
}
Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful:
import { encodingForModel } from 'js-tiktoken'; export class TokenCounter { private _totalTokens: number = 0; private _promptTokens: number = 0; private _completionTokens: number = 0; private _enc: any; constructor(model) { this._enc = encodingForModel(model); } encodeAndCountTokens(text: string): number { return this._enc.encode(text).length; } handleLLMEnd(result: any) { result.generations.forEach((generation: any) => { const content = generation[0]?.message?.text || ''; const calls = generation[0]?.message?.additional_kwargs || ''; console.log('Calls & Content:', { calls, content, }); const output = JSON.stringify(calls, null, 2); const tokens = this.encodeAndCountTokens(content + output); this._completionTokens += tokens; }); console.log('Tokens for this LLMEnd:', this._completionTokens); } handleChatModelStart(_, args) { args[0].forEach((arg) => { const content = arg?.content || ''; const calls = arg?.additional_kwargs || ''; const tokens = this.encodeAndCountTokens( content + JSON.stringify(calls, null, 2), ); this._promptTokens += tokens; console.log('content:', content, calls); }); console.log('Tokens for this ChatModelStart:', this._promptTokens); } modelTracer() { return { handleChatModelStart: this.handleChatModelStart.bind(this), handleLLMEnd: this.handleLLMEnd.bind(this), }; } sumTokens() { this._totalTokens = this._promptTokens + this._completionTokens; console.log('Total Tokens:', this._totalTokens); } }
I will try your solution as soon as possible, thank you very much.
The problem is that langsmith often shows 0 tokens. This makes a very important functionality of langsmith unusable due to this problem in Langchain. I hope @bracesproul or @jacoblee93 will look into this issue.
Yes, will fix this as OpenAI recently added support. There is an open PR here https://github.com/langchain-ai/langchainjs/pull/5485
Hey @jacoblee93. I just tested Release 0.2.4 and it still does not show the token usage when using RunnableSequence
.
The code:
const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.0 });
const vectorStore = await FaissStore.load(`data/search_index_${projectId}.pkl`, new OpenAIEmbeddings());
const vectorStoreRetriever = vectorStore.asRetriever();
const SYSTEM_TEMPLATE = `...`;
const messages = [
SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
HumanMessagePromptTemplate.fromTemplate("{question}"),
];
const prompt = ChatPromptTemplate.fromMessages(messages);
const chain = RunnableSequence.from([
{
sourceDocuments: RunnableSequence.from([
(input) => input.question,
vectorStoreRetriever,
]),
question: (input) => input.question,
},
{
sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
question: (previousStepResult) => previousStepResult.question,
context: (previousStepResult) =>
formatDocumentsAsString(previousStepResult.sourceDocuments),
},
{
result: prompt.pipe(llm).pipe(new StringOutputParser()),
sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
},
]);
return await chain.stream({question: question}, {
callbacks: [
{
handleLLMEnd(output: LLMResult, runId: string, parentRunId?: string, tags?: string[]): any {
output.generations.map((g) => console.log(JSON.stringify(g, null, 2)));
}
}
]
});
The output is as follows:
[
{
"text": "<the loooong answer goes here>",
"generationInfo": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"message": {
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"AIMessageChunk"
],
"kwargs": {
"content": "<the loooong answer goes here>",
"additional_kwargs": {},
"response_metadata": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"tool_call_chunks": [],
"tool_calls": [],
"invalid_tool_calls": []
}
}
}
]
Am I looking for the token count in the wrong place? Or has it not been implemented yet to provide the token count at handleLLMEnd
callback?
Can you verify you're on latest version of core and LangChain OpenAI?
https://js.langchain.com/v0.2/docs/how_to/installation/#installing-integration-packages
Otherwise will check tomorrow
Yes definitely:
❯ npm list
[email protected] ...
├── @langchain/[email protected]
├── @langchain/[email protected]
├── @langchain/[email protected]
...
├── [email protected]
...
All these packages are on latest.
I see openai just released an update for that
https://cookbook.openai.com/examples/how_to_stream_completions#4-how-to-get-token-usage-data-for-streamed-chat-completion-response
and seems like it was already done via this PR
EDIT: I should add that I'm use the Langchain Agents. I'm guessing support for token usage hasn't reached that yet
Unfortunately I am also the latest packages, and get a 0 token count even for the last chunk that is supposed to contain usages. Zero counts happening for the handleLLMEnd
callback, last message of .streamEvents
and .invoke
response
{
"generations": [
[
{
"text": "Hi there! How can I assist you today?",
"generationInfo": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"message": {
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"AIMessageChunk"
],
"kwargs": {
"content": "Hi there! How can I assist you today?",
"additional_kwargs": {},
"response_metadata": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"tool_call_chunks": [],
"tool_calls": [],
"invalid_tool_calls": []
}
}
}
]
]
}
{
"generations": [
[
{
"text": "removed for readability",
"generationInfo": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"message": {
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"AIMessageChunk"
],
"kwargs": {
"content": "removed for readability",
"additional_kwargs": {},
"response_metadata": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"tool_call_chunks": [],
"tool_calls": [],
"invalid_tool_calls": []
}
}
}
]
]
}
├── @langchain/[email protected]
├── @langchain/[email protected]
├── @langchain/[email protected]
├── [email protected]