File encoding issue: Chinese characters corrupted when reading/writing non-UTF-8 files
Description
Bug Summary
When working with files that are not encoded in UTF-8, Chinese characters become corrupted during both read and write operations.
Environment
- Kilo Version: [Version: 4.60.0 (7c81dc59)]
- API Provider: [Anthropic]
- Model: [Claude 4 Sonnet]
- Operating System: [Windows 11]
Steps to Reproduce
- Create or open a file containing Chinese characters that is encoded in non-UTF-8 format (e.g., GBK, GB2312)
- Attempt to read the file content using Kilo's file reading functionality
- Observe Chinese character corruption in the displayed content
- Try to edit and save the file
- Notice that previously correct Chinese characters also become corrupted
Expected Behavior
- Files with different encodings should be properly detected and handled
- Chinese characters should display correctly regardless of the original file encoding
- Writing to files should preserve the original encoding or provide encoding conversion options
- Existing correct Chinese characters should not be corrupted during editing
Actual Behavior
- Chinese characters appear as garbled text when reading non-UTF-8 files
- Writing operations corrupt both new and existing Chinese characters
- No encoding detection or conversion mechanism appears to be in place
Impact
- Cannot properly work with legacy files using GBK/GB2312 encoding
- Risk of data corruption when editing files containing Chinese text
- Affects workflow for users working with mixed-encoding file environments
Suggested Solution
- Implement automatic encoding detection for common Chinese encodings (UTF-8, GBK, GB2312, etc.)
- Provide encoding selection options in the UI
- Add proper encoding conversion when reading/writing files
- Display encoding information and allow manual override if needed
Additional Context
This issue particularly affects users working with legacy systems or files from different regions where UTF-8 is not the default encoding standard.
Thanks for your bug report. If other users have this same issue, please comment here, so we can prioritize it accordingly.
Yes, I also encountered the same issue. Due to some historical reasons, the code files I edit use GBK encoding. When I use kilo code to help me modify the code, it corrupts my original files, causing all the existing Chinese characters to turn into gibberish. You can see the specific issue in the image below. I hope there can be compatibility for non-Unicode characters
We also encountered this issue, which caused our team of at least 20 people to be unable to use the Kilo code
I also encountered the same issue
Can someone share a file that demonstrates this issue?
Can someone share a file that demonstrates this issue?
EncoTest.javaThis is a file I tested, and using GB2312 encoding in VS code can correctly display Chinese. Asking Kilo Code to help me modify Java class comments will cause Chinese characters on class methods to become garbled
i have issue with this
the orginal encoding is EUC-KR
cline has great support for editing and reading non-UTF-8 encoded files. I previously noticed this commit that fixed file encoding issues. Please forgive my lack of expertise in TypeScript—I tried to fix the file encoding issue in Roo Code myself but wasn't successful.😅
During my use of cline, I've observed that while the AI tool handles file encoding well when editing or reading files (thanks to that commit), it does not convert non-UTF-8 files to UTF-8 when adding context. As a result, the AI ends up re-reading the same file later, which wastes tokens. It would be great if we could address this together. 😉 @chrarnoldus