llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

A way to index image documents, using file name as the text document?

Open chengyjonathan opened this issue 1 year ago • 5 comments

Hi there!

I was wondering if there's a way to create a GPTSimpleVectorIndex, that indexes images documents against filename embeddings?

Ie. I title the image files ("Some Description of the Image.jpeg"). So when the user queries "Can you return images that fit the Description" the index returns the image, using the embedding for "Some Description of the Image?

chengyjonathan avatar Mar 22 '23 04:03 chengyjonathan

I tried doing the following. Where I keep the images, but I don't try to parse text from them.

And I thought supplying the filenames, using the filename function, would be added as extra info for the indexing. But all queries return 0 image documents.

When I read the docs, it seems like there's a way to provide extra info, using filenames, but it's unclear if we can use those filenames as the text document.

image_parser = llama_index.readers.file.base.ImageParser(
    keep_image=True, 
    parse_text=False
    )
file_extractor = llama_index.readers.file.base.DEFAULT_FILE_EXTRACTOR
file_extractor.update({
    ".jpg": image_parser,
    ".png": image_parser,
    ".jpeg": image_parser,
    })

filename_fn = lambda filename: {'file_name': filename}

image_reader = llama_index.SimpleDirectoryReader(
    input_dir='path to image files', 
    file_extractor=file_extractor, 
    file_metadata=filename_fn,
)
image_documents = image_reader.load_data()
image_index = llama_index.GPTSimpleVectorIndex(image_documents)

chengyjonathan avatar Mar 22 '23 04:03 chengyjonathan

My understanding is that "include_extra_info" is enabled by default. And that should be prepending each document text with the metadata text (in this case the filename).

But I might be misunderstanding, and this just prepends the query with the extra metadata provided?

chengyjonathan avatar Mar 22 '23 05:03 chengyjonathan

I think your understanding is correct. I'm surprised that this does not work.

One way to debug this is to build the GPTSimpleVectorIndex on only image documents, and check the source nodes from the response. You can see if the text properly contains the filename metadata or not.

Disiok avatar Mar 25 '23 02:03 Disiok

I got this to work in an unintended way I think. Where I overwrote the Image Parser to use the filename as the text string for the Image document. Though I think the intended approach does not currently work out of the box.

chengyjonathan avatar Mar 25 '23 16:03 chengyjonathan

The issue this brings, however, is that if you try to do multimodal queries. Ie. (Find me this image and tell me something from a different document). The response will try to say something using the filename text.

chengyjonathan avatar Mar 25 '23 16:03 chengyjonathan

Yeah, i also run into this. I did use the https://github.com/jerryjliu/llama_index/blob/main/examples/multimodal/Multimodal.ipynb , but in the way I intend to use it, using parsed text from the image in combination with the text documents gives unexpected results and is a bit much if you only want the image to display, for example in a chatbot context.

mfmjh avatar Apr 19 '23 10:04 mfmjh

Hi, @chengyjonathan! I'm here to help the LlamaIndex team manage their backlog and I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you were asking if it is possible to create an index for image documents using the file name as the text document. However, it seems that this approach does not currently work as all queries return 0 image documents. Some users suggested debugging by checking the source nodes from the response.

You mentioned that you found a workaround by overwriting the Image Parser, but this caused issues with multimodal queries. Another user also encountered unexpected results when using parsed text from images in combination with text documents.

If this issue is still relevant to the latest version of the LlamaIndex repository, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LlamaIndex project!

dosubot[bot] avatar Aug 20 '23 16:08 dosubot[bot]