langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Total token count of openai callback does not count embedding usage

Open tmbo opened this issue 1 year ago • 5 comments

When using embeddings, the total_tokens count of a callback is wrong, e.g. the following example currently returns 0 even though it shouldn't:


from langchain.callbacks import get_openai_callback

with get_openai_callback() as cb:
    embeddings = OpenAIEmbeddings()
    embeddings.embed_query("helo")
    print(cb.total_tokens)

IMO this is confusing (and there is no way to get the cost from the embeddings class at the moment).

tmbo avatar Feb 08 '23 19:02 tmbo

currently counting tokens via embedding is not supported - this is a good feature request though

hwchase17 avatar Feb 11 '23 07:02 hwchase17

I will start to work on this as a first issue. Will try to submit a draft in about two weeks. I am following the guidelines and hoping for a successful PR. :)

benheckmann avatar Mar 16 '23 10:03 benheckmann

This is for those who are looking for a quick workaround solution to count tokens when waiting for the PR to be merged:

import tiktoken

def num_tokens_from_string(string: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding("cl100k_base")
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string("tiktoken is great!")

Note that tiktoken is a library from OpenAI.

thaiminhpv avatar May 08 '23 14:05 thaiminhpv

You have no idea how expensive a large file transfer can be

anignx avatar May 18 '23 09:05 anignx

@thaiminhpv appreciate the workaround! Noob question:

If I'm doing something like...

db = Chroma.from_documents(texts, embeddings)

What would I pass into your function num_tokens_from_string? Iterate through texts and pass each one in? And those would just be the prompt tokens, right? How could I estimate the completion tokens used in the above call?

Thanks!

zacharypodbela avatar May 18 '23 16:05 zacharypodbela

@zacharypodbela As far as I understand that

total token used = prompt tokens + completion tokens

In Embedding API models, completion tokens is always 0 so that completion tokens = total token used

So I think your approach is sufficient, and the completion tokens used is 0

thaiminhpv avatar Jun 13 '23 13:06 thaiminhpv

I can confirm that embedding models such as text-embedding-ada-002 only consume prompt tokens.

However, I still must be missing something here, because the number of tokens returned by the num_tokens_from_string function is exactly half of what's shown in my OpenAI API page (e.g. 8,263 vs. 16,526). I'm using the below code:

text_splitter = RecursiveCharacterTextSplitter(chunk_size=self.chunk_size.get(), chunk_overlap=self.chunk_overlap.get())
texts = text_splitter.split_documents(documents)
for text in texts:
        self.tokens += text2tokens("text-embedding-ada-002", text.page_content)
self.price = tokens2price("text-embedding-ada-002", "embedding", self.tokens)
db = Chroma.from_documents(texts, OpenAIEmbeddings(model="text-embedding-ada-002"), persist_directory=self.database_directory)

https://github.com/sbslee/kanu/blob/06deef8ae91ba2b949c5e504a220f3fcdace9cf9/kanu/docgpt.py#L152-L157

As you can see, I'm using the tokens2price method, which is essentially equivalent to num_tokens_from_string, to calculate tokens for each text before summing them.

One thing I noticed from my OpenAI API page is that it said I made 2 requests even though I only created a single Chroma db:

12:40 PM
Local time: Jun 19, 2023, 9:40 PM
text-embedding-ada-002-v2, 2 requests
16,526 prompt + 0 completion = 16,526 tokens

I would greatly appreciate any insight into why I am encountering this discrepancy and why there are four requests instead of just one.

sbslee avatar Jun 17 '23 05:06 sbslee

I've made an interesting observation and thought I would share. I noticed that when I remove the persist_directory option, my OpenAI API page correctly displays the total number of tokens and the number of requests. In order to understand how tokens are consumed, I have been attempting to decipher the code for both langchain and chromadb, but unfortunately, I haven't had any luck. I hope this information is helpful for others who are trying to understand what's happening exactly.

sbslee avatar Jun 21 '23 02:06 sbslee

One thing I noticed from my OpenAI API page is that it said I made 2 requests even though I only created a single Chroma db:

12:40 PM
Local time: Jun 19, 2023, 9:40 PM
text-embedding-ada-002-v2, 2 requests
16,526 prompt + 0 completion = 16,526 tokens

I would greatly appreciate any insight into why I am encountering this discrepancy and why there are four requests instead of just one.

@sbslee I was working on the same thing of counting embeddings tokens when indexing. The maximum number of tokens that can be handled by text-embedding-ada-002 is 8192 tokens. So, if documents are > 8192 tokens we will have math.floor(totalTokens/8192) API calls The total consumed then is totalTokens * API calls (OFC you don't need to do this if totalTokens is <= 8192) Disclaimer: I don't know if this a solid way for counting, but for me it's working and I've checked it matches the API usage on OpenAI account. I will use this until langchain will handle this problem by itself.

manuel-84 avatar Jun 26 '23 13:06 manuel-84

@manuel-84,

Thanks for your input! I didn't realize the max input tokens for text-embedding-ada-002 is 8,192. I can confirm this from their website.

However, as you can see from my previous post (i.e. not using persist_directory somehow gave the correct token estimation and the number of API calls) and also from the below example, it appears that the model is capable of handling more than 8,192 tokens in one API call.

1:00 AM
Local time: Jun 24, 2023, 10:00 AM
text-embedding-ada-002-v2, 1 request
17,205 prompt + 0 completion = 17,205 tokens

Maybe their latest ada model text-embedding-ada-002-v2 has higher token limit -- I could not find any information regarding this.

At least for me, the critical factor has been whether or not I use the persist_directory option.

sbslee avatar Jun 27 '23 00:06 sbslee

At least for me, the critical factor has been whether or not I use the persist_directory option.

Maybe this is not critical, because in my case I'm using FAISS indexer (and not Chroma), but I'm having the same problem, that seems to be related to using OpenAIEmbeddings

manuel-84 avatar Jun 27 '23 11:06 manuel-84

Sorry to open this again. I am using 0.0.190 and I still have the same issue. Has this been resolved?

lalitkumarj avatar Jul 04 '23 19:07 lalitkumarj

++ Using 0.0.211 and also waiting for this feature))

andruhus avatar Jul 12 '23 15:07 andruhus

waiting for this too

axiangcoding avatar Jul 24 '23 03:07 axiangcoding

Is there anything new with this?

YanDavKMS avatar Sep 07 '23 09:09 YanDavKMS

Hi, I'd like to chime in here on the approach, first this is very much needed, but i think sticking with the callback standard, we should probably add something to the callback manager that is like embedding start, embedding end, and be able to let us tap in that way, it may mean also adding an new token type, called embedding to the response object. We have had to write our own token counter, cause in many instances different modesl are used, and you want to know how many tokens per model. Also to complicate this, it would be really great to make sure that calls and responses for embeddings are traced in LangSmith, which would need changes to how the callbacks are done. I dont think the callback manager is passed to the embeddings wrapper. Thoughts?

PvanHengel avatar Sep 08 '23 15:09 PvanHengel

Callbacks can only be called by chat_models. They cannot be called in embeddings or vectorstores. LangSmith's LangChainTracer is also not compatible with embeddings because it is designed for callbacks. Is there any good solutions for tokens/Cost calculation of embeddings and vectorstores?

lvisdd avatar Sep 09 '23 07:09 lvisdd

Agree 100% that’s basically my point that if we want to be able to trace calls to embeddings, track their token usage, and progress we need the callback manager in the embedding wrapper which we don’t have. My proposal would be to pass the callback manager down, and provide a new embeddings event, eg embedding_start and embedding_end, callback event that we can use by the tracer as well as other custom callbacks such as the token usage use case. It also would allow developers to show better progress reporting if they desire. We had considered building an extension wrapper for it but would require quite a bit of custom work that really belongs upstream. I don’t think it would be overly complex to add to the base objects

PvanHengel avatar Sep 12 '23 10:09 PvanHengel

Any updates?

svenez avatar Oct 25 '23 05:10 svenez

@svenez yes and no. We ended up writing or extending our own token counter, which uses a custom callback we pass around. The good news is that in the most recent version of langchain there now is a wrapper around retriever, which makes it easier to get a handle to the token counter callback, as well as makes your call appear in langsmith which is amazing @hwchase17, works well, the next step wills be to extend the base callback handler with a on retriever start and end hook, modify the ootb counter callback to include this, and then for team langsmith to explain how to get the token count added to the chain view in their ui, which I assume if we follow some standard way would be pretty straight forward.

Also unrelated but important, as custom callbacks are amazingly useful, it would be nice if we had a utility method that can call custom hooks on the run manager across all children callbacks, and if the child callback doesn’t have it throw a warning vs failure, night now we have a ton of looping code to to iterate through the callbacks and see which ones contain an extended method. Maybe I need to lift the hood and dig a bit deeper on the callback manager but open to ideas here, maybe will open another issue later, but it’s related to this issue.

PvanHengel avatar Oct 25 '23 11:10 PvanHengel

any news about this feature?

SamadovSh avatar Dec 07 '23 12:12 SamadovSh

Voting up, this feature would be really valuable to have to track costs for computing embeddings and executing vector store operations.

DSamuylov avatar Dec 10 '23 12:12 DSamuylov

definitely a vote up. Embeddings on openAI are more expensive as compared to regular prompts/chats for our use cases. This visualisation is needed.

rishumehrotra avatar Dec 11 '23 02:12 rishumehrotra