kotaemon
kotaemon copied to clipboard
[BUG] Word docx failing embedding
Description
Embeddings are failing for Word docx format.
The unstructured loader/reader gives an error.
This is using nomic-embed-text
Reproduction steps
1. In UI, select "Click to Upload" and attach local Word docx
2. Select "Upload and Index"
3. see
Screenshots

Logs
Using reader <kotaemon.loaders.unstructured_loader.UnstructuredReader object at 0x7f984bfba020>
No module named 'unstructured'
Traceback (most recent call last):
File "/media/justin/external/CodeReady/venv-external/lib/python3.10/site-packages/ktem/index/file/pipelines.py", line 795, in stream
file_id, docs = yield from pipeline.stream(
File "/media/justin/external/CodeReady/venv-external/lib/python3.10/site-packages/ktem/index/file/pipelines.py", line 642, in stream
docs = self.loader.load_data(file_path, extra_info=extra_info)
File "/media/justin/external/CodeReady/venv-external/lib/python3.10/site-packages/kotaemon/loaders/unstructured_loader.py", line 70, in load_data
from unstructured.partition.auto import partition
ModuleNotFoundError: No module named 'unstructured'
Browsers
No response
OS
Linux
Additional information
No response
The module named 'unstructured' might not be installed. You can install it using pip: pip install unstructured.
Hmm, OK I installed unstructured. It was indeed not installed. Now there's a different error that blocks the indexing.
It may be faster to reinstall but I've had installation issues: https://github.com/Cinnamon/kotaemon/issues/425