EmbeddingHandler gets stuck trying to create embeddings for images
Hi Team,
When I try to create embeddigns for some pdf docuements, the program got stuck after some time. After some investigations, it appears that all teh blocks which were pending ( and where application got stuck), had content_type : "image" in the mongodb block document. Perhaps not trying to embed non text blocks should work , trying to do this in code.
Sorry you are running into this issue. A couple of questions/ideas -
- Which combination of embedding db and embedding model are you using?
- Is there "text" in the image block?
Wondering if there is a problem combination of embedding db + embedding model if there is no text found in the block?
Hi Darren, thanks for the response. I am using Milvus along with embedding model industry-bert-sec. Also, there is no "text" in the image block, pasting the screenshot of one document which I think is stuck
It seems the code is stuck at the following method in status.py, apparently to keep listening for embedding status:
def tail_embedding_status(self, library_name, model_name, poll_seconds=0.2):
Thread(target = self._tail_embedding_status, args = (library_name, model_name, poll_seconds)).start()
However, it gives appearance as if the code is stuck. This can be remedied by making thread as a daemon thread as below so that the thread exits after the main thread has exited ( which shoudl be desired behaviour IMHO) :
def tail_embedding_status(self, library_name, model_name, poll_seconds=0.2):
thread=Thread(target = self._tail_embedding_status, args = (library_name, model_name, poll_seconds))
thread.daemon = True
thread.start()
If that makes sense then I can raise a PR for the same.
Thanks for finding this issue and resolving it - your PR has been merged into the main branch. Appreciate your contribution - and hopefully, the first of many! :)
Thanks for the encouraging comments, I would certainly like to keep contributing. I have not personally found anything better than llmware framework on RAG yet, so I have decided to use it in my own project which is based on RAG.