code2prompt icon indicating copy to clipboard operation
code2prompt copied to clipboard

Software cannot read files with Chinese characters normally

Open AlniyatYang opened this issue 1 year ago • 3 comments

I have a project code folder about the single-chip microcomputer, which are all C language, where if there exsits comment with Chinese charater , .c or .h will not be displayed in the generated markdown file inside, when I delete all the Chinese comments, and then generate, the code will be displayed

AlniyatYang avatar Nov 14 '24 07:11 AlniyatYang

Hi @AlniyatYang, I tried to reproduce files with Chinese characters and it seems to work in the latest version (potentially since #71). Can you confirm it works on your side ? Thank you

ODAncona avatar Feb 13 '25 02:02 ODAncona

I installed the latest version using cargo on April 25th and encountered the same issue in a single-chip microcomputer project. When the project's encoding format is GB2132, it is necessary to remove all Chinese comments for it to work properly.

lilimu996 avatar May 07 '25 05:05 lilimu996

Hello @lilimu996 Thanks for reproducing the issue.

This is interesting to see that code2prompt struggles with specific encoding.

It should be investigated as many legacy codebases would have special encoding.

It could be nice to write a general encoding implementation 🚀

ODAncona avatar May 08 '25 01:05 ODAncona