kilocode icon indicating copy to clipboard operation
kilocode copied to clipboard

File encoding issue: Chinese characters corrupted when reading/writing non-UTF-8 files

Open Professor-Chen opened this issue 5 months ago • 8 comments

Description

Bug Summary

When working with files that are not encoded in UTF-8, Chinese characters become corrupted during both read and write operations.

Environment

  • Kilo Version: [Version: 4.60.0 (7c81dc59)]
  • API Provider: [Anthropic]
  • Model: [Claude 4 Sonnet]
  • Operating System: [Windows 11]

Steps to Reproduce

  1. Create or open a file containing Chinese characters that is encoded in non-UTF-8 format (e.g., GBK, GB2312)
  2. Attempt to read the file content using Kilo's file reading functionality
  3. Observe Chinese character corruption in the displayed content
  4. Try to edit and save the file
  5. Notice that previously correct Chinese characters also become corrupted

Expected Behavior

  • Files with different encodings should be properly detected and handled
  • Chinese characters should display correctly regardless of the original file encoding
  • Writing to files should preserve the original encoding or provide encoding conversion options
  • Existing correct Chinese characters should not be corrupted during editing

Actual Behavior

  • Chinese characters appear as garbled text when reading non-UTF-8 files
  • Writing operations corrupt both new and existing Chinese characters
  • No encoding detection or conversion mechanism appears to be in place

Impact

  • Cannot properly work with legacy files using GBK/GB2312 encoding
  • Risk of data corruption when editing files containing Chinese text
  • Affects workflow for users working with mixed-encoding file environments

Suggested Solution

  • Implement automatic encoding detection for common Chinese encodings (UTF-8, GBK, GB2312, etc.)
  • Provide encoding selection options in the UI
  • Add proper encoding conversion when reading/writing files
  • Display encoding information and allow manual override if needed

Additional Context

This issue particularly affects users working with legacy systems or files from different regions where UTF-8 is not the default encoding standard.

Professor-Chen avatar Jul 22 '25 00:07 Professor-Chen

Thanks for your bug report. If other users have this same issue, please comment here, so we can prioritize it accordingly.

chrarnoldus avatar Jul 22 '25 14:07 chrarnoldus

Yes, I also encountered the same issue. Due to some historical reasons, the code files I edit use GBK encoding. When I use kilo code to help me modify the code, it corrupts my original files, causing all the existing Chinese characters to turn into gibberish. You can see the specific issue in the image below. I hope there can be compatibility for non-Unicode characters

Image

kashima19960 avatar Aug 03 '25 11:08 kashima19960

We also encountered this issue, which caused our team of at least 20 people to be unable to use the Kilo code

yuchen1117 avatar Aug 14 '25 09:08 yuchen1117

I also encountered the same issue

luhao200 avatar Aug 17 '25 08:08 luhao200

Can someone share a file that demonstrates this issue?

chrarnoldus avatar Aug 17 '25 09:08 chrarnoldus

Can someone share a file that demonstrates this issue?

EncoTest.javaThis is a file I tested, and using GB2312 encoding in VS code can correctly display Chinese. Asking Kilo Code to help me modify Java class comments will cause Chinese characters on class methods to become garbled

yuchen1117 avatar Aug 21 '25 07:08 yuchen1117

i have issue with this

Image

the orginal encoding is EUC-KR

ap0calypse21 avatar Sep 26 '25 09:09 ap0calypse21

cline has great support for editing and reading non-UTF-8 encoded files. I previously noticed this commit that fixed file encoding issues. Please forgive my lack of expertise in TypeScript—I tried to fix the file encoding issue in Roo Code myself but wasn't successful.😅

During my use of cline, I've observed that while the AI tool handles file encoding well when editing or reading files (thanks to that commit), it does not convert non-UTF-8 files to UTF-8 when adding context. As a result, the AI ends up re-reading the same file later, which wastes tokens. It would be great if we could address this together. 😉 @chrarnoldus

liuyu80 avatar Nov 25 '25 01:11 liuyu80