continue icon indicating copy to clipboard operation
continue copied to clipboard

Wrong character encoding in responses

Open CosmicMac opened this issue 1 year ago • 5 comments

Before submitting your bug report

Relevant environment info

- OS: Windows 10 & 11
- Continue: 0.0.33
- IDE:Jetbrains (C-Lion/PHP-Storm/DataGrip, latest versions, new UI)
- Server: docker ollama:latest

Description

Extended characters in responses are badly encoded (eg. "é" instead of "é"). Encoding is OK in direct responses from ollama when prompting with terminal.

To reproduce

1/ Select any gemma model 2/ Prompt "Translate elegant to french"

Log output

No response

CosmicMac avatar Feb 29 '24 13:02 CosmicMac

I tried this in both VS Code and Intellij and found that the encoding looked as expected (though Gemma gives interesting answers).

I'm wondering if this might be the model literally outputting "é" due to something it saw in its dataset. If you say something like "repeat after me: 'é'", can you get it to output the correct encoding?

Screenshot 2024-02-29 at 2 26 13 PM Screenshot 2024-02-29 at 2 22 28 PM

sestinj avatar Feb 29 '24 19:02 sestinj

gemma on acid :)

Unfortunately same problem with the repeat prompt: continue01

A quick test in console: continue02

CosmicMac avatar Mar 01 '24 12:03 CosmicMac

For me, it looks like a double utf-8 encoding. As I'm using a french OS maybe there is auto encoding occurring before forced encoding (or the other way round)? It would explain why you can't reproduce the glitch on your system.

CosmicMac avatar Mar 01 '24 13:03 CosmicMac

Ah this makes sense. Is this built into the OS, or might there be a setting that I could change in order to simulate this?

sestinj avatar Mar 01 '24 15:03 sestinj

Unfortunately I have no idea :(

CosmicMac avatar Mar 01 '24 17:03 CosmicMac