chatbox icon indicating copy to clipboard operation
chatbox copied to clipboard

[BUG] Autogenerated titles are sometimes in Chinese. Are my conversations actually private?

Open crypdick opened this issue 1 year ago • 4 comments

Bug Description Sometimes, auto-generated titles for English conversations are in Chinese.

Expected Results According to the nameConversation prompt, the title should always be in English.

Screenshots image

image image image

Desktop:

  • Operating System: Ubuntu 23.10
  • Application Version: 1.4.1

Additional Context This issue makes me suspect that my conversations in the closed source app are not truly private and are being sent to a custom model.

crypdick avatar Sep 13 '24 19:09 crypdick

I just noticed that the releases listed on github stop at 1.3.10, if you download from the website you are served 1.4.2. In the readme it does mention this:

This is the repository for the Chatbox Community Edition, open-sourced under the GPLv3 license. For most users, I recommend using the Chatbox Official Edition (closed-source).

But it does not clearly state that the download buttons below actually point to this closed source version. Which I think is a bit of a dark pattern and not cool at all.

I tried decompiling it and looking at the source code, but because Terser is used during packaging the code is obfuscated making it really difficult to see if anything shady is going on in this closed source version.

creesch avatar Sep 18 '24 08:09 creesch

So I found this previous discussion which gives a bit of context: https://github.com/Bin-Huang/chatbox/issues/803 Having read the discussion I find it less likely some malice is involved, although I can't of course rule it out entirely.

I still think that the below section of the readme should be clarified:

image

At the very least, it should say "Closed source download for ".

creesch avatar Sep 18 '24 08:09 creesch

Alright, one last reply. I had a look with tcpview open and when you do open up chatbox there is some traffic visible. This is to be expected given the update check and all that.

The traffic goes to 170.106.175.29 which turns out to simply be chatboxai.app.

When I click “new chat” I see activity to that address as well. Oddly enough that seems to be the only UI element causing traffic, I suspect some sort of analytics is going on here. Mind you, at this point I have only clicked the button, not typed in any prompt.

When I actually type something in the chat and send it off towards the LLM, I do not see any activity towards chatboxai.app. The only other traffic I see is towards the LLM provider I use, which is what I would expect.

So it looks like no data about your chats is being sent while chatting. The traffic I see on application startup is also not enough to indicate that previous chats are being sent somewhere. The traffic when clicking new chat is still a bit odd to me.

Overall, it looks like your data is safe. The behavior with the generated titles might simply because of a bug in the closed source version.

creesch avatar Sep 18 '24 16:09 creesch

Thank you for the detective work @creesch !

crypdick avatar Sep 20 '24 14:09 crypdick

Don't worry, your data is safe—Chatbox really values your privacy. As for why the closed-source edition's code is obfuscated, it's because I need to protect it. Honestly, with Electron, there's almost no way to safeguard the source code besides code obfuscation. Thanks to @creesch for the review and confirmation!

Getting back to the original issue with title generation, I don't think that's going to happen. Which model are you using? Does your system prompt or context include any Chinese text? I'm really curious about this issue. If you could provide more details, that'd be great! @crypdick

Bin-Huang avatar Oct 07 '24 14:10 Bin-Huang

@Bin-Huang My system prompts and context are always written in English. I use a mix of OpenAI and Anthropic endpoints and I have seen this issue across both model providers.

crypdick avatar Oct 08 '24 02:10 crypdick

@crypdick Thanks for the extra detail. Are the endpoints you mentioned official APIs from OpenAI and Anthropic? Also, which version of the Chatbox app are you using, and on what OS?

Bin-Huang avatar Oct 08 '24 06:10 Bin-Huang

That's right, nothing custom, official endpoints only.

Operating System: Ubuntu 23.10
Application Version: 1.4.1

crypdick avatar Oct 11 '24 14:10 crypdick

This is indeed a very interesting bug, thanks for bringing it to my attention. I think I've found the root cause: after multiple tests, I've discovered that the title generation prompt Chatbox ultimately sends to the model doesn't have any issues, meaning it doesn't contain any hints to generate Chinese titles. However, I've noticed that the model (gpt-4o) itself has a tendency to generate Chinese titles. In my case, I tried to have gpt-4o generate a title for a purely English conversation, and gpt-4o suddenly produced a Chinese title. After detailed testing, I found that gpt-4o has a certain probability (about less than 10%) of this occurring. It's pretty clear this is a case of the model hallucinating.

For anyone interested in this issue, you can reproduce my findings with the following code:

import openai

openai.api_key = 'your-api-key'

content = "Name the conversation based on the chat records.\nPlease provide a concise name, within 10 characters and without quotation marks.\nPlease use the speak language in the conversation.\nYou only need to answer with the name.\nThe following is the conversation:\n\n```\nis there any npm packages that can help me make a auto-resized textarea\n\n---------\n\n\n```\n\nPlease provide a concise name, within 10 characters and without quotation marks.\nPlease use the speak language in the conversation.\nYou only need to answer with the name.\nThe conversation is named:"

for i in range(40):
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": content}
        ]
    )
    response_content = response.choices[0].message.content or ''
    if any(ord(char) > 127 for char in response_content):
        print(response_content)

To fix this issue, I've tweaked the prompt for auto-generating titles, making sure it uses the language set in the app. This fix will be rolled out with the next update.

Thanks again for bringing this bug to my attention! It's hands down the most interesting bug I've fixed lately.

Bin-Huang avatar Oct 12 '24 03:10 Bin-Huang

This is an interesting bug. I think that this is caused by how the prompt is phrased. For example, the sentence "please use the speak language in the conversation" is not how a native speaker would write it; a more natural phrasing might be "please use the same language used in the conversation." This phrasing is a subtle signal to the model that the prompter is Chinese, which is why the summary sometimes includes Chinese characters, even though the prompt specifies to use the conversation's language.

crypdick avatar Oct 15 '24 19:10 crypdick

Thanks for your insights! I think you're right. This prompt was probably shared by someone else online, and I didn't really look at its tone or style. I've now tried writing a new prompt myself, which will fix those issues.

Based on the chat history, give this conversation a name.
Keep it short - 10 characters max, no quotes.
Use ${language}.
Just provide the name, nothing else.

Here's the conversation:
{history}

Name this conversation in 10 characters or less.
Use ${language}.
Only give the name, nothing else.

The name is:

@crypdick Could you take a look and let me know what you think?

Bin-Huang avatar Oct 16 '24 03:10 Bin-Huang

@Bin-Huang much better, although it is redundant. I would delete everything after Here's the conversation: {history}

crypdick avatar Oct 16 '24 21:10 crypdick