kotaemon icon indicating copy to clipboard operation
kotaemon copied to clipboard

[BUG] - 'gbk' codec can't decode byte 0x8c in position 2: illegal multibyte sequence When use GraphIndex

Open flyboyer opened this issue 1 year ago • 2 comments

Description

When I try to build a graph index, I uploaded a PDF file and started building the index. During this process, the following errors will occur:

Indexing [1/1]: small_test.pdf
 => Converting small_test.pdf to text
 => Converted small_test.pdf to text
 => [small_test.pdf] Processed 2 chunks
 => Finished indexing small_test.pdf
[GraphRAG] Creating index... This can take a long time.
Logging enabled at 

c:\Users\**\Desktop\small\remote\kotaemon\ktem_app_data\user_data\files\graphr

ag\8ebbc1ff-2bef-49aa-803a-c72ffcbeb476\output\20240909-162212\reports\indexing

-engine.log

Error: 'gbk' codec can't decode byte 0x8c in position 2: illegal multibyte sequence

image

Are there any constraints or limitations on the uploaded PDF document?

Reproduction steps

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

Screenshots

No response

Logs

No response

Browsers

No response

OS

No response

Additional information

No response

flyboyer avatar Sep 09 '24 08:09 flyboyer

Encountering same issue using GraphRAG indexing. The UI doesn't provide enough information for debug, nor can I find any logging for it in the console, or a log for debugging GraphRAG indexing process

The same pdf does just fine in normal indexing process

2013 Reinforcement Learning in Robotics - A Survey.pdf

RealmX1 avatar Sep 10 '24 01:09 RealmX1

do you solve it yet? cin-jimmy

zjiang4 avatar Sep 28 '24 18:09 zjiang4