Sidekick icon indicating copy to clipboard operation
Sidekick copied to clipboard

FR: Utilise vision models via image attachments

Open juioudgdgeue894 opened this issue 9 months ago • 5 comments

Currently, Sidekick only allows document file uploads but does not support actual image analysis. Adding vision model capabilities would allow users to upload images for processing, enabling object recognition, scene description, text extraction (OCR), and other insights.

juioudgdgeue894 avatar Mar 18 '25 12:03 juioudgdgeue894

@juioudgdgeue894

Thanks for the suggestion! 🙏

Sidekick uses llama.cpp's llama-server for inference. Unfortunately, it doesn't currently support image attachments. However, they are actively working on adding support for this feature.

It seems like a lot of folks want this feature, so I will most likely add support for image attachments (local and remote inference) to Sidekick when llama.cpp brings VLM support to llama-server.

That being said, if anyone wants to implement this feature immediately for remote inference right away, I'm open to a PR!

johnbean393 avatar Mar 18 '25 14:03 johnbean393

Ahh - understood. In that case, I look forward to it being implemented. It'll only take Sidekick up an even further notch!

Wish I could contribute - unfortunately I'm not a programmer! Happy to continue bug testing and feature requesting though. Love the app!

juioudgdgeue894 avatar Mar 19 '25 10:03 juioudgdgeue894

@juioudgdgeue894

Might be a bit ambitious, but when VLM support comes around, I'll see if I can extend resource search in experts to calculate embeddings for images as well, so RAG can be done on images as well.

johnbean393 avatar Mar 30 '25 04:03 johnbean393

@juioudgdgeue894

As of commit #4378a61, support has been added for remote VLMs. This has been tested with OpenRouter and Alibaba Cloud.

johnbean393 avatar Apr 08 '25 04:04 johnbean393

@juioudgdgeue894

Vision support has now been added to llama-server!

I'll work on supporting local VLMs like Gemma 3.

johnbean393 avatar May 10 '25 07:05 johnbean393