Knowledge API special chars encoding challenge
Self Checks
- [X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] Pleas do not modify this template :) and fill in all the required fields.
Dify version
Version 0.6.5
Cloud or Self Hosted
Cloud
Steps to reproduce
- Use the http request node and send a json post to the Knowledge API endpoint
- Include special chars in the post request - example "ä" and the strings in the attached screenshots
- Issue seems to happen intermittently.
✔️ Expected Behavior
String captured in Knowledge dataset reflect String submitted via API
❌ Actual Behavior
Special chars are captured as other (unrelated) special chars
It appears the issue with special characters encoding in the Knowledge API might be influenced by several factors, including the indexing technique, the configuration of the embedding model, and how the model manager interacts with these configurations. Specifically, the indexing_technique set to 'high_quality' and the details of the embedding model (provider and model_type) are crucial areas to investigate. Errors in embedding model availability or provider token initialization could lead to encoding problems.
The Dify v0.6.5 codebase is designed to handle special characters like "ä" properly, as indicated by its use of UTF-8 encoding which supports these characters natively. This is further supported by the direct use of special characters in the source code, such as in the German translations. Therefore, the issue might not stem from the encoding of characters in the source code but rather from how these characters are processed or displayed in different parts of the application.
The headers set for JSON requests (Content-Type: application/json and Authorization) should not impact the encoding of special characters, as JSON inherently supports Unicode. Any encoding issues are more likely related to server-side handling or the JSON encoding process itself.
To better assist with resolving this issue, it would be beneficial to have more details about the specific conditions under which the encoding issue occurs, including the models, techniques, or parameters used. Additionally, any error messages or logs observed when the issue occurs could provide valuable insights into the underlying cause.
Details
Sources
- api/services/dataset_service.py
- sdks/php-client/dify-client.php
- en/features/workflow/nodes/code.md
- web/i18n/de-DE/common.ts
About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
i have the same problem, the only difference is api url