Token Consumption Discrepancy in Dify LLM Node Due to Base64 Image Conversion
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
Version 0.14.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
When using the Dify LLM node for visual recognition, I noticed that the returned tokens and USD consumption are incorrect. LLM image recognition converts images to base64 and sends them to GPT. An image is typically around 10MB in size, and such large images undoubtedly cause significant token consumption. However, Dify displays inaccurate tokens and USD usage, which misled us during troubleshooting.
We eventually confirmed that the token consumption was caused by converting images to base64 and sending them to GPT. Please verify the current situation and address this issue as it directly impacts costs, which is a critical concern. Thank you!
✔️ Expected Behavior
Accurately return the actual token and cost consumption to help us calculate expenses.
❌ Actual Behavior
Incorrect token and cost values were returned.
The issue you're experiencing with token consumption discrepancies in Dify's LLM node when handling base64 image conversion is likely due to the current implementation of token calculation. In the Dify codebase, the method responsible for calculating token consumption, _num_tokens_from_messages, processes text content for token calculation but does not include image content in the token count. This is because the token calculation for images is not implemented, which could lead to the discrepancies you're observing [1][2][3][4].
To address this issue, it would be necessary to implement a mechanism that accurately accounts for the token consumption of base64-encoded images. This would ensure that the reported token and cost values reflect the actual usage, helping you better manage expenses.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other
Thank you for the clarification regarding the token calculation discrepancy. I would suggest that since base64 is essentially a string, it could be processed using the same token calculation logic as text. By treating base64 strings as regular strings for token counting, you could quickly and efficiently resolve this issue without requiring a completely new mechanism.
This approach might simplify implementation and ensure that the reported token and cost values more accurately reflect the actual consumption. Let me know if additional details or use cases are needed to assist in testing.
The fact is, we are unable to keep the pricing information for all models up to date, so the listed cost details are for reference only. We may improve or remove this feature in the future. For now, we recommend using our integrated ops tools in combination with the models you are using to calculate costs.
Hi, @xuemusi. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- You reported an issue with the Dify LLM node's token usage reporting for base64-encoded images.
- The current method inaccurately calculates token consumption, misleading users about costs.
- You suggested treating base64 strings as regular text for accurate token counting.
- "Laipz8200" noted that cost details are for reference and recommended using integrated ops tools for accuracy.
Next Steps:
- Please let me know if this issue is still relevant to the latest version of the Dify repository by commenting here.
- If there is no further activity, this issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!