gpt4-pdf-chatbot-langchain
gpt4-pdf-chatbot-langchain copied to clipboard
Total costs $ for the 56-Page PDF Document vs 1 query
Hello,
Could you give us an idea of the total costs for the 56-Page documents given 1 query:
- creating the embedding (a one time step)
- storing the embeddings in Pinecone
- matching a query of 250 tokens vs. the embedding: costs of ADA, and costs of the query to Pinecone
- the first query to gpt4: chat history + the query
- the second query to gpt4: standalone question + relevant documents
It seems like a lot of queries, it would be very helpful to have an idea about these costs.
Btw, thank you for this tutorial !
This is a fantastic idea!
Maybe adding a small counter of dollars spent in the front-end can save you from a heart attack when the credit card bill rolls in
Hello,
Could you give us an idea of the total costs for the 56-Page documents given 1 query:
- creating the embedding (a one time step)
- storing the embeddings in Pinecone
- matching a query of 250 tokens vs. the embedding: costs of ADA, and costs of the query to Pinecone
- the first query to gpt4: chat history + the query
- the second query to gpt4: standalone question + relevant documents
It seems like a lot of queries, it would be very helpful to have an idea about these costs.
Btw, thank you for this tutorial !
Let me look into this and get back to you shortly.
Also curious about this! Not sure how much money I'd burn through if I used this.
Embeddings cost $0.0004 / 1K tokens so they are very cost-effective. (1 token is approx 3/4 words) Using openai's (tokenizer)[https://platform.openai.com/tokenizer?view=bpe], you can see how tokens are calculated. For example, if you're embedding 50-page PDF that's approx 25,000 words. Which is approx. 33,000 tokens ~ $ 0.001
For context, it costs approx 0.48 USD per 1.2 million tokens embeddings.
With respect to pinecone pricing, the free tier is very generous, but for paid/production level pricing is here
As for the gpt-4 calls, I will continue to review the intermediate steps and get back on that shortly.
About Pinecone pricing, it would be possible to switch to pgvector for a self-hosting.
About Pinecone pricing, it would be possible to switch to pgvector for a self-hosting.
I think there are a good number of vector database alternatives referenced by OpenAI in the chatgot retrieval plugin repository. They didn't mention pgvector, but I wonder if it's possible to plug weaviate or redis in here.
Hi, @databill86! I'm Dosu, and I'm here to help the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, you are requesting information on the total costs associated with processing a 56-page PDF document with one query. There have been discussions about the cost-effectiveness of embeddings and the pricing of Pinecone, as well as a suggestion to switch to pgvector for self-hosting. However, the issue remains unresolved.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the project!