private-gpt
private-gpt copied to clipboard
add exception handling for document load errors
trafficstars
I encountered a PDFSyntaxError. When loading multiple documents the errors should not crash the loading of the documents instead notify the user of the errors.
This works as described for me with a few epub files failing from ERROR: Pandoc died with exitcode "64" during conversion: xmlns not in namespaces and the Unexpected EOF on a few PDFs as well.
Loading new documents: 71%|████████████▊ | 197/276 [00:04<00:01, 62.00it/s] - source_documents/TuffShit/Computer_Books/30 Assorted Computers and Technology Books Collection April 17, 2021/Springer Handbook of Power Systems by Konstantin O. Papailiou.pdf: ERROR: Unexpected EOF
Loading new documents: 79%|██████████████▎ | 219/276 [00:37<00:09, 5.79it/s]
Loaded 219 new documents from source_documents
Split into 15726 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Ingestion complete! You can now run privateGPT.py to query your documents