markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

UnicodeEncodeError:'gbk' codec can't encode character '\uf075' in position XXX: illegal multibyte sequence

Open ClusterA-DragReduction opened this issue 9 months ago • 3 comments

Image

ClusterA-DragReduction avatar Mar 21 '25 11:03 ClusterA-DragReduction

I have run [System.Console]::OutputEncoding = [System.Text.Encoding]::UTF8. not working. Another document tried, it shows: UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 55: illegal multibyte sequence

ClusterA-DragReduction avatar Mar 21 '25 12:03 ClusterA-DragReduction

Thanks for the report. I will investigate.

Does this happen only with the CLI, or also with the library?

afourney avatar Mar 21 '25 16:03 afourney

Ok, I think I found the problem and fixed it with 0.1.0a6. Please let me know if it works better for you. Once a fix is confirmed, I will close this issue.

afourney avatar Mar 21 '25 17:03 afourney

Thank you for the quick fix. I have tested the files with issues and now they are good even I didn't run [System.Console]::OutputEncoding = [System.Text.Encoding]::UTF8. Thanks again.

ClusterA-DragReduction avatar Mar 24 '25 11:03 ClusterA-DragReduction