LLPhant icon indicating copy to clipboard operation
LLPhant copied to clipboard

Can not add an embedding with the .docx format.

Open Vardan9898 opened this issue 1 year ago • 1 comments

ErrorException: Undefined array key "data" in /home/.../vendor/theodo-group/llphant/src/Embeddings/EmbeddingGenerator/OpenAI/AbstractOpenAIEmbeddingGenerator.php:117

sample1.docx

Code:

$reader = new FileDataReader($filePath, PlaceEntity::class); $documents = $reader->getDocuments(); $splittedDocuments = DocumentSplitter::splitDocuments($documents, 1024, "\n"); $formattedDocuments = EmbeddingFormatter::formatEmbeddings($splittedDocuments); $embededDocuments = $embeddingGenerator->embedDocuments($formattedDocuments);

I use OpenAI3SmallEmbeddingGenerator()

Vardan9898 avatar Aug 11 '24 11:08 Vardan9898

JsonException: Malformed UTF-8 characters, possibly incorrectly encoded in /home/.../vendor/theodo-group/llphant/src/Embeddings/EmbeddingGenerator/OpenAI/AbstractOpenAIEmbeddingGenerator.php:111

Another error when uploading pdf file

Vardan9898 avatar Aug 11 '24 12:08 Vardan9898

JsonException: Malformed UTF-8 characters, possibly incorrectly encoded in /home/.../vendor/theodo-group/llphant/src/Embeddings/EmbeddingGenerator/OpenAI/AbstractOpenAIEmbeddingGenerator.php:111

Another error when uploading pdf file

I think that this is an error related to the smalot/pdfparser library used for parsing PDF. One option would be to open an issue there. Anyway, could you please provide a sample PDF file that generates this issue? Thank you.

f-lombardo avatar Aug 12 '24 16:08 f-lombardo

ErrorException: Undefined array key "data" in /home/.../vendor/theodo-group/llphant/src/Embeddings/EmbeddingGenerator/OpenAI/AbstractOpenAIEmbeddingGenerator.php:117

I created a PR to try to fix this issue: https://github.com/theodo-group/LLPhant/pull/200

Please check if it works for you

f-lombardo avatar Aug 12 '24 16:08 f-lombardo

Seems it's fixed for docx, but could you please check with attached pdf file? I get an The-No-Funnel-Strategy.pdf error:

JsonException: Malformed UTF-8 characters, possibly incorrectly encoded in /home/.../vendor/theodo-group/llphant/src/Embeddings/EmbeddingGenerator/OpenAI/AbstractOpenAIEmbeddingGenerator.php:111

Vardan9898 avatar Aug 12 '24 17:08 Vardan9898

Seems it's fixed for docx, but could you please check with attached pdf file?

I pushed a new commit to the previous PR. Can you please check if it works for you?

f-lombardo avatar Aug 13 '24 10:08 f-lombardo

It works. Thank you.

Vardan9898 avatar Aug 13 '24 17:08 Vardan9898

Thanks @Vardan9898 for the issue and @f-lombardo for the PR!

MaximeThoonsen avatar Aug 14 '24 15:08 MaximeThoonsen