kotaemon
kotaemon copied to clipboard
[BUG] - 'gbk' codec can't decode byte 0x8c in position 2: illegal multibyte sequence When use GraphIndex
Description
When I try to build a graph index, I uploaded a PDF file and started building the index. During this process, the following errors will occur:
Indexing [1/1]: small_test.pdf
=> Converting small_test.pdf to text
=> Converted small_test.pdf to text
=> [small_test.pdf] Processed 2 chunks
=> Finished indexing small_test.pdf
[GraphRAG] Creating index... This can take a long time.
Logging enabled at
c:\Users\**\Desktop\small\remote\kotaemon\ktem_app_data\user_data\files\graphr
ag\8ebbc1ff-2bef-49aa-803a-c72ffcbeb476\output\20240909-162212\reports\indexing
-engine.log
Error: 'gbk' codec can't decode byte 0x8c in position 2: illegal multibyte sequence
Are there any constraints or limitations on the uploaded PDF document?
Reproduction steps
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
Screenshots
No response
Logs
No response
Browsers
No response
OS
No response
Additional information
No response
Encountering same issue using GraphRAG indexing. The UI doesn't provide enough information for debug, nor can I find any logging for it in the console, or a log for debugging GraphRAG indexing process
The same pdf does just fine in normal indexing process
do you solve it yet? cin-jimmy