kotaemon icon indicating copy to clipboard operation
kotaemon copied to clipboard

[BUG] - <title>Error: 'gbk' codec can't encode character

Open kksasa opened this issue 1 year ago • 2 comments

Description

The ms graphrag index is so hard to use? Upload several pdf both cannot be parsed with below error Error: 'gbk' codec can't encode character '\xa9' in position 238: illegal multibyte sequence

Reproduction steps

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

No response

Browsers

Chrome

OS

Windows

Additional information

No response

kksasa avatar Sep 10 '24 01:09 kksasa

Same error here. This is what I'm trying to upload. car-kn1.md

I think my doc encoding is utf-8, why gbk is used?

I deployed without docker on my win11 PC.

smilefufu avatar Sep 11 '24 07:09 smilefufu

same question. How can we change encoding method used?

RealmX1 avatar Sep 15 '24 01:09 RealmX1