gpt4-pdf-chatbot-langchain icon indicating copy to clipboard operation
gpt4-pdf-chatbot-langchain copied to clipboard

Total costs $ for the 56-Page PDF Document vs 1 query

Open databill86 opened this issue 2 years ago • 6 comments
trafficstars

Hello,

Could you give us an idea of the total costs for the 56-Page documents given 1 query:

  • creating the embedding (a one time step)
  • storing the embeddings in Pinecone
  • matching a query of 250 tokens vs. the embedding: costs of ADA, and costs of the query to Pinecone
  • the first query to gpt4: chat history + the query
  • the second query to gpt4: standalone question + relevant documents

It seems like a lot of queries, it would be very helpful to have an idea about these costs.

Btw, thank you for this tutorial !

databill86 avatar Mar 22 '23 09:03 databill86

This is a fantastic idea!

Maybe adding a small counter of dollars spent in the front-end can save you from a heart attack when the credit card bill rolls in

alfredo-f avatar Mar 22 '23 13:03 alfredo-f

Hello,

Could you give us an idea of the total costs for the 56-Page documents given 1 query:

  • creating the embedding (a one time step)
  • storing the embeddings in Pinecone
  • matching a query of 250 tokens vs. the embedding: costs of ADA, and costs of the query to Pinecone
  • the first query to gpt4: chat history + the query
  • the second query to gpt4: standalone question + relevant documents

It seems like a lot of queries, it would be very helpful to have an idea about these costs.

Btw, thank you for this tutorial !

Let me look into this and get back to you shortly.

mayooear avatar Mar 23 '23 00:03 mayooear

Also curious about this! Not sure how much money I'd burn through if I used this.

kimjongbing avatar Mar 24 '23 10:03 kimjongbing

Embeddings cost $0.0004 / 1K tokens so they are very cost-effective. (1 token is approx 3/4 words) Using openai's (tokenizer)[https://platform.openai.com/tokenizer?view=bpe], you can see how tokens are calculated. For example, if you're embedding 50-page PDF that's approx 25,000 words. Which is approx. 33,000 tokens ~ $ 0.001

For context, it costs approx 0.48 USD per 1.2 million tokens embeddings.

With respect to pinecone pricing, the free tier is very generous, but for paid/production level pricing is here

As for the gpt-4 calls, I will continue to review the intermediate steps and get back on that shortly.

mayooear avatar Mar 26 '23 04:03 mayooear

About Pinecone pricing, it would be possible to switch to pgvector for a self-hosting.

sebastienfi avatar Apr 03 '23 13:04 sebastienfi

About Pinecone pricing, it would be possible to switch to pgvector for a self-hosting.

I think there are a good number of vector database alternatives referenced by OpenAI in the chatgot retrieval plugin repository. They didn't mention pgvector, but I wonder if it's possible to plug weaviate or redis in here.

databill86 avatar Apr 03 '23 16:04 databill86

Hi, @databill86! I'm Dosu, and I'm here to help the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are requesting information on the total costs associated with processing a 56-page PDF document with one query. There have been discussions about the cost-effectiveness of embeddings and the pricing of Pinecone, as well as a suggestion to switch to pgvector for self-hosting. However, the issue remains unresolved.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the project!

dosubot[bot] avatar Sep 24 '23 16:09 dosubot[bot]