llm-numbers
llm-numbers copied to clipboard
Calculation of "5:1 -- Cost Ratio of generation of text using GPT-3.5-Turbo vs OpenAI embedding"
Hi, could you share the calculation for this one, and maybe add it to the footnote? not sure I understand it.
Here are the numbers I find on the open ai pricing page:
GPT 3.5 Turbo text generation
Model | Input | Output |
---|---|---|
4K context | $0.0015 / 1K tokens | $0.002 / 1K tokens |
16K context | $0.003 / 1K tokens | $0.004 / 1K tokens |
Embeddings
Model | Usage |
---|---|
Ada v2 | $0.0001 / 1K tokens |
You give the example of answering "What is the capital of Delaware?", if you had to answer this question with a LM that doesn't have the info in its weights you could say you have to embed all the documents of a corpus that contains this answer, you could choose an arbitrary narrow scope but that could also be the whole wikipedia, which is something like 5.6B tokens, and would cost something like 5.6e9/1000*$0.0001=$560 to just index.
What am I missing ?
Thanks!
Hey Theo,
Thanks for reaching out!
The point is you don't use an LM, you use a vector database. You embed questions like capitals in a semantic search index. This includes products/oss projects like FAISS, Chroma and commercial products like Vectara, Pinecone, etc.
These are all examples of the general body of techniques called retrieval augmented generation.
Thank you for your answer Waleed!
I understand the tools you mentioned help in the retrieval part, however when doing retrieval augmented generation you have a retreiver and a generator, the generator being a language model.
I understand you could just retrieve the document without using a lanugage model but that would juste be called document retrieval.
In both scenarios, creating embeddings, indexing, and performing semantic search are necessary steps. Regarding the 5:1 ratio, you mentioned vector lookup being considered free, but I'm curious about the calculations behind it, especially given potential expenses with large document sets.
Add to that the cost of the generator, which I'm sure would be cheaper than an API call to GPT 3.5 Turbo as you don't need a model that big once you feed it the info it needs on a case by case basis, but still requires an infrastructure to run on.
Could you please provide insight into the calculation for the 5:1 ratio?
Appreciate your help!
Sure, here's the calculation breakdown:
For text generation:
GPT-3.5 Turbo with 4K context: $0.0015 per 1K tokens input, $0.002 per 1K tokens output. GPT-3.5 Turbo with 16K context: $0.003 per 1K tokens input, $0.004 per 1K tokens output. For embeddings:
Ada v2: $0.0001 per 1K tokens. You're correct in your Numbers example about embedding documents for answering questions. Let's say you need to index the entire Wikipedia (around 5.6 billion tokens). The cost would be approximately: Cost = (5.6e9 / 1000) * $0.0001 = $560.
If you have further questions or need clarification, feel free to ask!