langchainjs icon indicating copy to clipboard operation
langchainjs copied to clipboard

How to get token count, callbacks work but just for ChatOpenAI but not for RetrievalQAChain

Open theTechGoose opened this issue 1 year ago • 21 comments

I am trying to get a token count for a process, I am passing callbacks to the class initialization like this

let finalTokens = 0
const initPayload = {
  openAIApiKey: process.env['OPEN_AI_KEY'],
  temperature: 1.5,
  callbacks: [
    {
      handleLLMEnd: (val) => {
        try {
          const tokens = val.llmOutput.tokenUsage.totalTokens
          finalTokens += tokens
          console.log({tokens, finalTokens})
        } catch {
          console.log(val.generations[0])
        }
      },
    },
  ],
};

However all of the calls from a RetrievalQAChain end up in the catch portion of that try-catch block as 'tokenUsage' does not exist for those calls. Can someone point me in the right direction?

theTechGoose avatar Apr 23 '23 20:04 theTechGoose

I'm having a similar issue. When I define gpt-3.5-turbo as the model for OpenAI construct, llmOutput is missing tokenUsage object.

Using the same construct, but not defining the model returns token usage as part of llmOutput

Not working:

  const model = new OpenAI({
    openAIApiKey: openAISecret,
    modelName: 'gpt-3.5-turbo',
    callbacks: [
      {
        handleLLMEnd: async (output: LLMResult) => {
          logger.info('output', { output })
          logger.info('tokenUsage', { tokenUsage: output.llmOutput })
          // tokenUsage: UNDEFINED
        },
      },
    ],
  })

Working:

  const model = new OpenAI({
    openAIApiKey: openAISecret,
    callbacks: [
      {
        handleLLMEnd: async (output: LLMResult) => {
          logger.info('output', { output })
          logger.info('tokenUsage', { tokenUsage: output.llmOutput })
          // tokenUsage: found
        },
      },
    ],
  })

I added own issue for this

miekassu avatar Apr 28 '23 08:04 miekassu

Yeah, the problem is that not defining the model uses davinci-003 which costs 0.02 per token vs the 3.5 turbo, which is 0.002

On Fri, Apr 28, 2023 at 4:56 AM, Kasper Hämäläinen < @.*** > wrote:

I'm having a similar issue. When I define gpt-3.5-turbo as the model for OpenAI construct, llmOutput is missing tokenUsage object.

Using the same construct, but not defining the model returns token usage as part of llmOutput

Not working:

const model = new OpenAI({ openAIApiKey: openAISecret,

modelName: 'gpt-3.5-turbo', callbacks: [ { handleLLMEnd: async (output: LLMResult) => { logger. info ( http://logger.info/ ) ('output', { output })

logger. info ( http://logger.info/ ) ('tokenUsage', { tokenUsage: output.llmOutput }) // tokenUsage: UNDEFINED }, }, ], })

Working:

const model = new OpenAI({ openAIApiKey: openAISecret,

callbacks: [ { handleLLMEnd: async (output: LLMResult) => {

logger. info ( http://logger.info/ ) ('output', { output }) logger. info ( http://logger.info/ ) ('tokenUsage', { tokenUsage: output.llmOutput }) // tokenUsage: found }, },

], })

— Reply to this email directly, view it on GitHub ( https://github.com/hwchase17/langchainjs/issues/965#issuecomment-1527218240 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ANK7G7HOXLZL5T2X7RZMIHDXDOA4FANCNFSM6AAAAAAXIXQAZY ). You are receiving this because you authored the thread. Message ID: <hwchase17/langchainjs/issues/965/1527218240 @ github. com>

theTechGoose avatar Apr 28 '23 16:04 theTechGoose

Use it this way:

import { ChatOpenAI } from "langchain/chat_models/openai";

const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo" });

I found that importing it like that returns the tokenUsage in handleLLMEnd handler

ciocan avatar May 04 '23 15:05 ciocan

still waiting for the solution on this

ibrahimyaacob92 avatar Jul 02 '23 15:07 ibrahimyaacob92

tested with azure API:

curl -X POST -H 'Content-type: application/json' -H 'User-Agent: OpenAI/NodeJS/3.3.0' -H 'api-key: xxxxx' -H --data '{"model":"gpt-3.5-turbo","temperature":0.7,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"!"}]}' https://{azureApiInstanceName}.openai.azure.com/openai/deployments/{azureOpenAIApiDeploymentName}/chat/completi ns\?api-version=2023-05-15

stream=false gets usage data, works as expected. stream=true result "usage":null

pond918 avatar Jul 09 '23 06:07 pond918

I am also running into this. There doesn’t seem to be anyway to grab cost or at least token usage when calling chains or agents. Having and output after a chain or agent finishes with total usage would be great

j1philli avatar Jul 11 '23 06:07 j1philli

Same issue, tokenUsage is not returned when using OpenAI() model.

fvisticot avatar Aug 01 '23 14:08 fvisticot

same problem here, when streaming is set to true it doesn't return token usage. Any idea for workaround?

nikorter avatar Aug 14 '23 10:08 nikorter

Hello everyone,

I recently started working on a stealth startup, and I'm using langchainjs as a core component of our tech stack. I must say, I've been impressed with the work done here! Thank you so much for all your hard work on this project, and for providing tools that startups like mine can rely on!

While integrating the library, I noticed the problem of lack of token statistics when using ChatOpenAi in streaming mode. I did some digging in the code and I believe I found the source of the problem.

In _generate method of ChatOpenAi class it's expected that data.usage contains completion_tokens, prompt_tokens, and total_tokens fields which are later copied to tokenUsage. When ChatOpenAi is instantiated with streaming: false, response.data field from the call to OpenAIApi.createChatCompletion is returned as data. response.data is an instance of Completion which indeed contains usage with the required fields. That's why token usage works with streaming: false. When ChatOpenAi is instantiated with streaming: true, response object is not created by the OpenAIApi but in the code of _generate instead. This branch of the implementation doesn't set the usage field at all.

I believe that adding the required fields, using the .getNumTokensFromMessages(...) might address this.

// EDIT 2023-08-17 8:50 CET

I did some more digging. It's not as simple as I thought. Using .getNumTokensFromMessages(...) would introduce two more calls to OpenAI API. Using it to get tokenUsage for each call with streaming: true would introduce additional cost for all users of the library even if they don't care about token usage.

It turns out that the original langchain implementation has the same problem. When streaming=True, The ChatResult instance is created without llm_output field which contains token usage stats.

Both implementations are actually correct as the source of problem lies within the OpenAI API. When streaming is enabled the token usage statistics are not being sent to the client at all. What is being sent is a stream of chat.completion.chunk objects that don't contain any token information.

mchalapuk avatar Aug 17 '23 06:08 mchalapuk

Did anyone find solution for this?

MirzaHasnat avatar Sep 12 '23 19:09 MirzaHasnat

I think the reason is that the GPT-3.5-turbo model can only be used for Chat models.

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ...." \
  -d '{
    "model": "gpt-3.5-turbo",         
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'
{
  "error": {
    "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}'

I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue.

// old
// const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
// new 
const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });

const prompt = PromptTemplate.fromTemplate(
  "What is a good name for a company that makes {product}?"
);

const chain = new LLMChain({ llm: model, prompt }); 

const resA2 = await chain.run("colorful socks", {callbacks: [{
  handleLLMEnd: (output, runId, parentRunId?, tags?) => {
    const { completionTokens, promptTokens, totalTokens } =
      output.llmOutput?.tokenUsage; 
    console.log(completionTokens ?? 0);
    console.log(promptTokens ?? 0);
    console.log(totalTokens ?? 0);
    
// "llmOutput": {
//     "tokenUsage": {
//       "completionTokens": 3,
//       "promptTokens": 20,
//       "totalTokens": 23
//     }
//   }
  },
}]});

ankitruong avatar Sep 23 '23 19:09 ankitruong

I managed to count tokens for streaming: true by using callbacks:

const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true });
const chain = new LLMChain({ llm: model, prompt })
const { text: assistantResponse } = await chain.call({
    query: query,
  }, {
    callbacks: [
      {
        handleChatModelStart: async (llm, messages) => {
          const tokenCount = tokenCounter(messages[0][0].content);
          // The prompt is available here: messages[0][0].content
        },
        handleChainEnd: async (outputs) => {
          const { text: outputText } = outputs;
          // outputText is the response from the chat call
          const tokenCount = tokenCounter(outputText);
        }
      }
    ]
  }
);

liowalex avatar Sep 27 '23 19:09 liowalex

I managed to count tokens for streaming: true by using callbacks:

const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true });
const chain = new LLMChain({ llm: model, prompt })
const { text: assistantResponse } = await chain.call({
    query: query,
  }, {
    callbacks: [
      {
        handleChatModelStart: async (llm, messages) => {
          const tokenCount = tokenCounter(messages[0][0].content);
          // The prompt is available here: messages[0][0].content
        },
        handleChainEnd: async (outputs) => {
          const { text: outputText } = outputs;
          // outputText is the response from the chat call
          const tokenCount = tokenCounter(outputText);
        }
      }
    ]
  }
);

Doesn't that only account for the initial prompt and the final response (not any intermediate calls for functions, etc)?

jwilger avatar Oct 09 '23 23:10 jwilger

I think the reason is that the GPT-3.5-turbo model can only be used for Chat models.

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ...." \
  -d '{
    "model": "gpt-3.5-turbo",         
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'
{
  "error": {
    "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}'

I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue.

// old
// const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
// new 
const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });

const prompt = PromptTemplate.fromTemplate(
  "What is a good name for a company that makes {product}?"
);

const chain = new LLMChain({ llm: model, prompt }); 

const resA2 = await chain.run("colorful socks", {callbacks: [{
  handleLLMEnd: (output, runId, parentRunId?, tags?) => {
    const { completionTokens, promptTokens, totalTokens } =
      output.llmOutput?.tokenUsage; 
    console.log(completionTokens ?? 0);
    console.log(promptTokens ?? 0);
    console.log(totalTokens ?? 0);
    
// "llmOutput": {
//     "tokenUsage": {
//       "completionTokens": 3,
//       "promptTokens": 20,
//       "totalTokens": 23
//     }
//   }
  },
}]});

This solved the issue for me too!

girithodu avatar Dec 12 '23 06:12 girithodu

Any news on this?

I still get an empty object for the token usage with streaming mode enabled.

brokenfiles avatar Mar 08 '24 13:03 brokenfiles

Hi Thread, I am using typescript sdk of langchain. I am still receiving 0 token count. Can you please here ?

rutwikpulseenergy avatar Apr 17 '24 03:04 rutwikpulseenergy

@jacoblee93 Any help here ?

rutwikpulseenergy avatar Apr 24 '24 13:04 rutwikpulseenergy

@hwchase17 @nfcampos @bracesproul @sullivan-sean Any help here ?

rutwikpulseenergy avatar Apr 25 '24 02:04 rutwikpulseenergy

Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back

Package Version: 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts';
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";


// Define the tools the agent will have access to.
const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];

const llm = new ChatOpenAI({
  modelName: "gpt-4-turbo",
  temperature: 0.15,
  maxRetries: 3,
  timeout: 30000,
  callbacks: [
    {
      handleLLMEnd(output) {
        console.log(output)
        output.generations.map(generation => {
          generation.map(g => {
            // console.log(g.message.response_metadata.tokenUsage)
          })
        })
      },
    }
  ]
});

const prompt = ChatPromptTemplate.fromMessages([
        [
            'system',
            `You are a virtual agent`,
        ],
        new MessagesPlaceholder({
            variableName: 'chat_history',
            optional: true,
        }),
        ['user', '{input}'],
        new MessagesPlaceholder({
            variableName: 'agent_scratchpad',
            optional: false,
        }),
    ]);

const agent = await createOpenAIToolsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
});

const result = await agentExecutor.invoke({
  input: "what is LangChain?, describe it in a sentence",
});

console.log(result);

The output

{
  generations: [
    [
      ChatGenerationChunk {
        text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
        generationInfo: {
          prompt: 0,
          completion: 0,
          finish_reason: 'stop'
        },
        message: AIMessageChunk {
          lc_serializable: true,
          lc_kwargs: {
            content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
            additional_kwargs: {},
            response_metadata: {
              prompt: 0,
              completion: 0,
              finish_reason: 'stop'
            },
            tool_call_chunks: [],
            tool_calls: [],
            invalid_tool_calls: []
          },
          lc_namespace: [ 'langchain_core', 'messages' ],
          content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
          name: undefined,
          additional_kwargs: {},
          response_metadata: {
            prompt: 0,
            completion: 0,
            finish_reason: 'stop'
          },
          tool_calls: [],
          invalid_tool_calls: [],
          tool_call_chunks: []
        },
        __proto__: {
          constructor: ƒ ChatGenerationChunk(),
          concat: ƒ concat()
        }
      }
    ]
  ]
}

clovisrodriguez avatar Apr 30 '24 10:04 clovisrodriguez

I just wrote my own using the OpenAI api. The implementation is not that complex and you have more control, and don't have to wait over a year for someone else to fix.

-- Raphael Castro TED Team Monster Reservations Group C: 843.855.7133 www.monsterrg.com ( http://www.monsterrg.com/ )

On Tue, Apr 30, 2024 at 6:08 AM, Clovis Rodriguez < @.*** > wrote:

Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back

Package Version : 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122

import { ChatOpenAI } from @./openai"; import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts'; import { TavilySearchResults } from @./community/tools/tavily_search"; import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";

// Define the tools the agent will have access to. const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];

const llm = new ChatOpenAI({ modelName: "gpt-4-turbo", temperature: 0.15,

maxRetries: 3, timeout: 30000, callbacks: [ {

handleLLMEnd(output) { console.log(output)

output.generations.map(generation => { generation.map(g => {

// console.log(g.message.response_metadata.tokenUsage) })

}) }, } ] });

const prompt = ChatPromptTemplate.fromMessages([ [ 'system',

You are a virtual agent, ], new MessagesPlaceholder({

variableName: 'chat_history', optional: true, }),

['user', '{input}'], new MessagesPlaceholder({

variableName: 'agent_scratchpad', optional: false, }), ]);

const agent = await createOpenAIToolsAgent({ llm, tools,

prompt, });

const agentExecutor = new AgentExecutor({ agent, tools, });

const result = await agentExecutor.invoke({ input: "what is LangChain?, describe it in a sentence", });

console.log(result);

The output

{ generations: [ [ ChatGenerationChunk { text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', generationInfo: {

prompt: 0, completion: 0, finish_reason: 'stop'

}, message: AIMessageChunk { lc_serializable: true,

lc_kwargs: { content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',

additional_kwargs: {}, response_metadata: {

prompt: 0, completion: 0, finish_reason: 'stop' }, tool_call_chunks: [],

tool_calls: [], invalid_tool_calls: [] },

lc_namespace: [ 'langchain_core', 'messages' ], content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', name: undefined,

additional_kwargs: {}, response_metadata: { prompt: 0, completion: 0, finish_reason: 'stop'

}, tool_calls: [], invalid_tool_calls: [],

tool_call_chunks: [] }, proto: {

constructor: ƒ ChatGenerationChunk(), concat: ƒ concat()

} } ] ] }

— Reply to this email directly, view it on GitHub ( https://github.com/langchain-ai/langchainjs/issues/965#issuecomment-2084895840 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ANK7G7CA6O7Y2M3PN67TZ73Y75URDAVCNFSM6AAAAAAXIXQAZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBUHA4TKOBUGA ). You are receiving this because you authored the thread. Message ID: <langchain-ai/langchainjs/issues/965/2084895840 @ github. com>

theTechGoose avatar Apr 30 '24 13:04 theTechGoose

Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back

Package Version: 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts';
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";


// Define the tools the agent will have access to.
const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];

const llm = new ChatOpenAI({
  modelName: "gpt-4-turbo",
  temperature: 0.15,
  maxRetries: 3,
  timeout: 30000,
  callbacks: [
    {
      handleLLMEnd(output) {
        console.log(output)
        output.generations.map(generation => {
          generation.map(g => {
            // console.log(g.message.response_metadata.tokenUsage)
          })
        })
      },
    }
  ]
});

const prompt = ChatPromptTemplate.fromMessages([
        [
            'system',
            `You are a virtual agent`,
        ],
        new MessagesPlaceholder({
            variableName: 'chat_history',
            optional: true,
        }),
        ['user', '{input}'],
        new MessagesPlaceholder({
            variableName: 'agent_scratchpad',
            optional: false,
        }),
    ]);

const agent = await createOpenAIToolsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
});

const result = await agentExecutor.invoke({
  input: "what is LangChain?, describe it in a sentence",
});

console.log(result);

The output

{
  generations: [
    [
      ChatGenerationChunk {
        text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
        generationInfo: {
          prompt: 0,
          completion: 0,
          finish_reason: 'stop'
        },
        message: AIMessageChunk {
          lc_serializable: true,
          lc_kwargs: {
            content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
            additional_kwargs: {},
            response_metadata: {
              prompt: 0,
              completion: 0,
              finish_reason: 'stop'
            },
            tool_call_chunks: [],
            tool_calls: [],
            invalid_tool_calls: []
          },
          lc_namespace: [ 'langchain_core', 'messages' ],
          content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
          name: undefined,
          additional_kwargs: {},
          response_metadata: {
            prompt: 0,
            completion: 0,
            finish_reason: 'stop'
          },
          tool_calls: [],
          invalid_tool_calls: [],
          tool_call_chunks: []
        },
        __proto__: {
          constructor: ƒ ChatGenerationChunk(),
          concat: ƒ concat()
        }
      }
    ]
  ]
}

same issue

gkhngyk avatar May 18 '24 14:05 gkhngyk

@bracesproul Brace, I think the 0 token issue is a very serious problem, any chance you can look into it?

gkhngyk avatar May 18 '24 15:05 gkhngyk

Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful:

import { encodingForModel } from 'js-tiktoken';

export class TokenCounter {
    private _totalTokens: number = 0;
    private _promptTokens: number = 0;
    private _completionTokens: number = 0;
    private _enc: any;

    constructor(model) {
        this._enc = encodingForModel(model);
    }

    encodeAndCountTokens(text: string): number {
        return this._enc.encode(text).length;
    }

    handleLLMEnd(result: any) {
        result.generations.forEach((generation: any) => {
            const content = generation[0]?.message?.text || '';
            const calls = generation[0]?.message?.additional_kwargs || '';
            console.log('Calls & Content:', {
                calls,
                content,
            });
            const output = JSON.stringify(calls, null, 2);
            const tokens = this.encodeAndCountTokens(content + output);
            this._completionTokens += tokens;
        });
        console.log('Tokens for this LLMEnd:', this._completionTokens);
    }

    handleChatModelStart(_, args) {
        args[0].forEach((arg) => {
            const content = arg?.content || '';
            const calls = arg?.additional_kwargs || '';

            const tokens = this.encodeAndCountTokens(
                content + JSON.stringify(calls, null, 2),
            );
            this._promptTokens += tokens;
            console.log('content:', content, calls);
        });

        console.log('Tokens for this ChatModelStart:', this._promptTokens);
    }

    modelTracer() {
        return {
            handleChatModelStart: this.handleChatModelStart.bind(this),
            handleLLMEnd: this.handleLLMEnd.bind(this),
        };
    }

    sumTokens() {
        this._totalTokens = this._promptTokens + this._completionTokens;
        console.log('Total Tokens:', this._totalTokens);
    }
}

clovisrodriguez avatar May 21 '24 03:05 clovisrodriguez

Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful:

import { encodingForModel } from 'js-tiktoken';

export class TokenCounter {
    private _totalTokens: number = 0;
    private _promptTokens: number = 0;
    private _completionTokens: number = 0;
    private _enc: any;

    constructor(model) {
        this._enc = encodingForModel(model);
    }

    encodeAndCountTokens(text: string): number {
        return this._enc.encode(text).length;
    }

    handleLLMEnd(result: any) {
        result.generations.forEach((generation: any) => {
            const content = generation[0]?.message?.text || '';
            const calls = generation[0]?.message?.additional_kwargs || '';
            console.log('Calls & Content:', {
                calls,
                content,
            });
            const output = JSON.stringify(calls, null, 2);
            const tokens = this.encodeAndCountTokens(content + output);
            this._completionTokens += tokens;
        });
        console.log('Tokens for this LLMEnd:', this._completionTokens);
    }

    handleChatModelStart(_, args) {
        args[0].forEach((arg) => {
            const content = arg?.content || '';
            const calls = arg?.additional_kwargs || '';

            const tokens = this.encodeAndCountTokens(
                content + JSON.stringify(calls, null, 2),
            );
            this._promptTokens += tokens;
            console.log('content:', content, calls);
        });

        console.log('Tokens for this ChatModelStart:', this._promptTokens);
    }

    modelTracer() {
        return {
            handleChatModelStart: this.handleChatModelStart.bind(this),
            handleLLMEnd: this.handleLLMEnd.bind(this),
        };
    }

    sumTokens() {
        this._totalTokens = this._promptTokens + this._completionTokens;
        console.log('Total Tokens:', this._totalTokens);
    }
}

I will try your solution as soon as possible, thank you very much.

The problem is that langsmith often shows 0 tokens. This makes a very important functionality of langsmith unusable due to this problem in Langchain. I hope @bracesproul or @jacoblee93 will look into this issue.

gkhngyk avatar May 22 '24 14:05 gkhngyk

Yes, will fix this as OpenAI recently added support. There is an open PR here https://github.com/langchain-ai/langchainjs/pull/5485

jacoblee93 avatar May 22 '24 18:05 jacoblee93

Hey @jacoblee93. I just tested Release 0.2.4 and it still does not show the token usage when using RunnableSequence.

The code:

    const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.0 });
    const vectorStore = await FaissStore.load(`data/search_index_${projectId}.pkl`, new OpenAIEmbeddings());
    const vectorStoreRetriever = vectorStore.asRetriever();

    const SYSTEM_TEMPLATE = `...`;
    const messages = [
      SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
      HumanMessagePromptTemplate.fromTemplate("{question}"),
    ];
    const prompt = ChatPromptTemplate.fromMessages(messages);

    const chain = RunnableSequence.from([
    {
      sourceDocuments: RunnableSequence.from([
        (input) => input.question,
        vectorStoreRetriever,
      ]),
      question: (input) => input.question,
    },
    {
      sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
      question: (previousStepResult) => previousStepResult.question,
      context: (previousStepResult) =>
        formatDocumentsAsString(previousStepResult.sourceDocuments),
    },
    {
      result: prompt.pipe(llm).pipe(new StringOutputParser()),
      sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
    },
  ]);

  return await chain.stream({question: question}, {
    callbacks: [
      {
        handleLLMEnd(output: LLMResult, runId: string, parentRunId?: string, tags?: string[]): any {
          output.generations.map((g) => console.log(JSON.stringify(g, null, 2)));
        }
      }
    ]
  });

The output is as follows:

 [
   {
     "text": "<the loooong answer goes here>",
     "generationInfo": {
       "prompt": 0,
       "completion": 0,
       "finish_reason": "stop"
     },
     "message": {
       "lc": 1,
       "type": "constructor",
       "id": [
         "langchain_core",
         "messages",
         "AIMessageChunk"
       ],
       "kwargs": {
         "content": "<the loooong answer goes here>",
         "additional_kwargs": {},
         "response_metadata": {
           "prompt": 0,
           "completion": 0,
           "finish_reason": "stop"
         },
         "tool_call_chunks": [],
         "tool_calls": [],
         "invalid_tool_calls": []
       }
     }
   }
 ]

Am I looking for the token count in the wrong place? Or has it not been implemented yet to provide the token count at handleLLMEnd callback?

zaiddabaeen avatar Jun 02 '24 01:06 zaiddabaeen

Can you verify you're on latest version of core and LangChain OpenAI?

https://js.langchain.com/v0.2/docs/how_to/installation/#installing-integration-packages

Otherwise will check tomorrow

jacoblee93 avatar Jun 02 '24 04:06 jacoblee93

Yes definitely:

❯ npm list
[email protected] ...
├── @langchain/[email protected]
├── @langchain/[email protected]
├── @langchain/[email protected]
...
├── [email protected]
...

All these packages are on latest.

zaiddabaeen avatar Jun 03 '24 23:06 zaiddabaeen

I see openai just released an update for that

https://cookbook.openai.com/examples/how_to_stream_completions#4-how-to-get-token-usage-data-for-streamed-chat-completion-response

and seems like it was already done via this PR

niztal avatar Jun 07 '24 02:06 niztal

EDIT: I should add that I'm use the Langchain Agents. I'm guessing support for token usage hasn't reached that yet

Unfortunately I am also the latest packages, and get a 0 token count even for the last chunk that is supposed to contain usages. Zero counts happening for the handleLLMEnd callback, last message of .streamEvents and .invoke response

{
  "generations": [
    [
      {
        "text": "Hi there! How can I assist you today?",
        "generationInfo": {
          "prompt": 0,
          "completion": 0,
          "finish_reason": "stop"
        },
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain_core",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "Hi there! How can I assist you today?",
            "additional_kwargs": {},
            "response_metadata": {
              "prompt": 0,
              "completion": 0,
              "finish_reason": "stop"
            },
            "tool_call_chunks": [],
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ]
}
{
  "generations": [
    [
      {
        "text": "removed for readability",
        "generationInfo": {
          "prompt": 0,
          "completion": 0,
          "finish_reason": "stop"
        },
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain_core",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "removed for readability",
            "additional_kwargs": {},
            "response_metadata": {
              "prompt": 0,
              "completion": 0,
              "finish_reason": "stop"
            },
            "tool_call_chunks": [],
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ]
}
  ├── @langchain/[email protected]
  ├── @langchain/[email protected]
  ├── @langchain/[email protected]
    ├── [email protected]

rrichc avatar Jun 07 '24 22:06 rrichc