ai-bash icon indicating copy to clipboard operation
ai-bash copied to clipboard

parse json error

Open cdhigh opened this issue 2 years ago • 14 comments

Is there a way to perform pre-escaping of characters when using non-english (portugues etc), which contains some non-ASCII characters like ç, ã, õ, etc.? Sometimes, JSON parsing errors occur that prevent the display of ChatGPT's responses, for example: jq: parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 13, column 93.

Is there any method for pre-escaping?

edited: If there are special characters in the text that needs to be sent, this error can also occur.

cdhigh avatar Jun 23 '23 18:06 cdhigh

I found some unicode strings in conversations.json. conversations.json.zip

Is it because the file was saved using Unicode encoding, and the decoding failed due to the lack of the original encoding during reading?

cdhigh avatar Jun 23 '23 20:06 cdhigh

I suspect the unicode encoding in the conversations.json file you attached are more likely due to keystrokes you (maybe inadvertently) sent while interacting with the script, rather than non-printable characters being part of the conversation. I used your json and could playback the chats just fine, can't reproduce the jq error you got in the first post. I also tried starting a new conversation using the characters you mentioned specifically, without any issues. Can you please tell me the steps to reproduce this jq error?

nitefood avatar Jun 23 '23 21:06 nitefood

Here's the test I did:

image

nitefood avatar Jun 23 '23 21:06 nitefood

Enter "como dancar lindo" to reproduce the issue.

jq

Or can you remove all characters less than 0x1f from the user-entered string?

cdhigh avatar Jun 23 '23 23:06 cdhigh

Sorry, I can't reproduce. Just copy/pasted the sentece you suggested several times and got no errors. May this have to do with the Kindle underlying OS? Maybe a localization issue? Have you tried defaulting to a UTF-8 locale? I have my default locale set on en_US.UTF-8 if that may help.

image

image

image

nitefood avatar Jun 24 '23 07:06 nitefood

Or can you remove all characters less than 0x1f from the user-entered string?

that should not be necessary since the script already "stringifies" the input using jq -Rs here - that include handling special characters whose value is less than 0x1f, e.g. 0xA for newline (LF)

nitefood avatar Jun 24 '23 07:06 nitefood

I tried some tricks by searching on google, still no luck. Are we able to determine which one is causing the error among multiple jq statements in the code?

cdhigh avatar Jun 24 '23 10:06 cdhigh

you may try to run the script using bash -x ai "como dancar lindo" to see exactly what gets called, which variables get set and all the output.

nitefood avatar Jun 24 '23 10:06 nitefood

caught this line! https://github.com/nitefood/ai-bash-gpt/blob/22e77b1a0c8c1aab6a2ea7fdcfa0a018b42e62a2/ai#L735C5-L735C61 735 response_text=$(jq -r '.content' <<<"$response_message")

jq2

cdhigh avatar Jun 24 '23 10:06 cdhigh

you may try placing a hexdump -C <<<"$response_message" before that line to see exactly what control character jq is complaining about, then run the script normally

nitefood avatar Jun 24 '23 10:06 nitefood

a lot of pages, can hexdump to a file?

I don't know the first thing about bash and shell~~~ I have some experiences in field of C/C++ and python

cdhigh avatar Jun 24 '23 10:06 cdhigh

complaining in line 17 col 178.

rsphexdump.txt

cdhigh avatar Jun 24 '23 11:06 cdhigh

This is another dump, line 11 col 151.

rsphexdump (1).txt

cdhigh avatar Jun 24 '23 11:06 cdhigh

After continuous searching and experimentation, it was discovered that the error was caused by a restriction in the JSON data where line breaks were not allowed. If a line break was present, it needed to be escaped. By modifying the following line of code: https://github.com/nitefood/ai-bash-gpt/blob/22e77b1a0c8c1aab6a2ea7fdcfa0a018b42e62a2/ai#L735C5-L735C61 735 response_text=$(jq -r '.content' <<<"$response_message") to response_text=$(jq -Rnr '[inputs] | join("\\n") | fromjson | .content' <<<"$response_message") The issue was totally resolved!

The code line is come from stackoverflow.

However, I am still don't know of the exact meaning of the modified line of code~~~

PS: I don't have glow installed in Kindle, maybe this is the reason you cannot reproduce the issue.

cdhigh avatar Jun 24 '23 13:06 cdhigh