llmware icon indicating copy to clipboard operation
llmware copied to clipboard

EmbeddingHandler gets stuck trying to create embeddings for images

Open virunew opened this issue 2 years ago • 3 comments

Hi Team,

When I try to create embeddigns for some pdf docuements, the program got stuck after some time. After some investigations, it appears that all teh blocks which were pending ( and where application got stuck), had content_type : "image" in the mongodb block document. Perhaps not trying to embed non text blocks should work , trying to do this in code.

virunew avatar Dec 27 '23 12:12 virunew

Sorry you are running into this issue. A couple of questions/ideas -

  1. Which combination of embedding db and embedding model are you using?
  2. Is there "text" in the image block?

Wondering if there is a problem combination of embedding db + embedding model if there is no text found in the block?

doberst avatar Dec 27 '23 14:12 doberst

Hi Darren, thanks for the response. I am using Milvus along with embedding model industry-bert-sec. Also, there is no "text" in the image block, pasting the screenshot of one document which I think is stuck

image

virunew avatar Dec 27 '23 16:12 virunew

It seems the code is stuck at the following method in status.py, apparently to keep listening for embedding status:

def tail_embedding_status(self, library_name, model_name, poll_seconds=0.2):
        Thread(target = self._tail_embedding_status, args = (library_name, model_name, poll_seconds)).start()

However, it gives appearance as if the code is stuck. This can be remedied by making thread as a daemon thread as below so that the thread exits after the main thread has exited ( which shoudl be desired behaviour IMHO) :

def tail_embedding_status(self, library_name, model_name, poll_seconds=0.2):
   thread=Thread(target = self._tail_embedding_status, args = (library_name, model_name, poll_seconds))
   thread.daemon = True
   thread.start() 

If that makes sense then I can raise a PR for the same.

virunew avatar Dec 27 '23 17:12 virunew

Thanks for finding this issue and resolving it - your PR has been merged into the main branch. Appreciate your contribution - and hopefully, the first of many! :)

doberst avatar Dec 29 '23 12:12 doberst

Thanks for the encouraging comments, I would certainly like to keep contributing. I have not personally found anything better than llmware framework on RAG yet, so I have decided to use it in my own project which is based on RAG.

virunew avatar Dec 29 '23 16:12 virunew