anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

Support Image Uploading

Open timothycarambat opened this issue 1 year ago • 3 comments

The document processor should support the uploading and embedding of images like PNG, JPEG, and other static formats.

Ideally, this should describe the image and return that text for embedding instead of trying to do a multi-modal embedding which will be impossible to search textually over.

timothycarambat avatar Jun 27 '23 17:06 timothycarambat

Are you thinking that it should use something like a BERT or Deepdanbooru to extract info from?

AntonioCiolino avatar Jul 06 '23 03:07 AntonioCiolino

Are you thinking that it should use something like a BERT or Deepdanbooru to extract info from?

Both of these would be an issue to run locally since they require some big resources. Deepdanbooru is also specific to anime-girls image tagging and tends to give more NSFW results so honestly easiest implementation is just using something simple like OpenAIs CLIP which can run on replicate pretty easily (but will still cost money)

https://replicate.com/rmokady/clip_prefix_caption

timothycarambat avatar Jul 06 '23 04:07 timothycarambat

If you are calling out to external resources, there’s lots of choices of course.

AntonioCiolino avatar Jul 06 '23 12:07 AntonioCiolino

it return anythingllm File extension .jpg not supported for parsing and cannot be assumed as text file type.

phicha20224 avatar May 30 '24 08:05 phicha20224

@phicha20224 - that is because we dont support uploading images right now

timothycarambat avatar May 30 '24 14:05 timothycarambat

@phicha20224 - that is because we dont support uploading images right now

timothycarambat avatar May 30 '24 14:05 timothycarambat

what i need to do for the support on images?

rainbowkode avatar Jun 15 '24 09:06 rainbowkode