dify icon indicating copy to clipboard operation
dify copied to clipboard

The tool description in prompt is Ascii

Open 401557122 opened this issue 1 year ago • 5 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.15

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

May I ask why the tool information in the prompt is all in Ascii encoding format? Is this an error? I previously directly modified the secure_ascii=True parameter of the dump in my local JSON package to fix this issue. I would like to know if this is necessary

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

401557122 avatar Jul 24 '24 08:07 401557122

The tool information in the prompt is in ASCII encoding format because the ensure_ascii parameter is set to True in the JSON serialization process. This is not an error but a deliberate setting in the codebase.

For example, in the unauthorized_handler function, the json.dumps method is used with ensure_ascii=True:

@login_manager.unauthorized_handler
def unauthorized_handler():
    """Handle unauthorized requests."""
    return Response(json.dumps({
        'code': 'unauthorized',
        'message': "Unauthorized."
    }, ensure_ascii=True), status=401, content_type="application/json")

If you want to change this behavior, you can modify the ensure_ascii parameter to False in the relevant parts of the code where JSON serialization occurs [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jul 24 '24 08:07 dosubot[bot]

@dosu What is the purpose of setting this

401557122 avatar Jul 24 '24 08:07 401557122

Setting ensure_ascii=True in the JSON serialization process ensures that the output is ASCII-only by escaping all non-ASCII characters using \uXXXX sequences. This can be useful for compatibility with systems that may not handle non-ASCII characters properly.

Changing ensure_ascii to False allows the JSON encoder to output non-ASCII characters as-is, making the JSON data more readable and compact if it contains many non-ASCII characters.

Example:

import json

data = {"message": "Hola, ¿cómo estás?"}

# ensure_ascii=True
json_data_ascii = json.dumps(data, ensure_ascii=True)
print(json_data_ascii)  # Output: {"message": "Hola, \u00bfcómo est\u00e1s?"}

# ensure_ascii=False
json_data_non_ascii = json.dumps(data, ensure_ascii=False)
print(json_data_non_ascii)  # Output: {"message": "Hola, ¿cómo estás?"}

In the first case, non-ASCII characters are escaped, while in the second case, they are preserved [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jul 24 '24 09:07 dosubot[bot]

But the prompts are all Ascii encoded, can LLMs understand @dosu

401557122 avatar Jul 24 '24 09:07 401557122

LLMs (Large Language Models) can understand prompts encoded in ASCII as well as those containing non-ASCII characters. However, encoding prompts in ASCII (using \uXXXX sequences for non-ASCII characters) can make the text less readable for humans but does not affect the model's ability to process the information.

If readability is a concern and you want to ensure that non-ASCII characters are preserved in their original form, you can modify the ensure_ascii parameter to False in the JSON serialization process. This will allow the JSON encoder to output non-ASCII characters as-is, making the prompts more readable.

For example, in the unauthorized_handler function, you can change:

return Response(json.dumps({
    'code': 'unauthorized',
    'message': "Unauthorized."
}, ensure_ascii=True), status=401, content_type="application/json")

to:

return Response(json.dumps({
    'code': 'unauthorized',
    'message': "Unauthorized."
}, ensure_ascii=False), status=401, content_type="application/json")

This change will preserve non-ASCII characters in their original form in the JSON output.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jul 24 '24 09:07 dosubot[bot]