LLamaSharp icon indicating copy to clipboard operation
LLamaSharp copied to clipboard

Any plans to bring llava support?

Open vshapenko opened this issue 1 year ago • 22 comments

Hi! Are there any plans to support llava as well? As i see, it is merged to llama.cpp about month ago and gives possibility to work with image recognition aswell.

vshapenko avatar Dec 01 '23 12:12 vshapenko

I have a prototype working. I will see if I can clean and finish the project this weekend.

SignalRT avatar Dec 01 '23 13:12 SignalRT

I have a prototype working. I will see if I can clean and finish the project this weekend.

Hello! Are there any news?

vshapenko avatar Dec 05 '23 10:12 vshapenko

Looking forward to the change! Just discussed how this will help significantly to solve a use-case for our internal application. Do you have it available in your fork already? I'd like to pull it into mine and play around with it.

philippjbauer avatar Dec 07 '23 15:12 philippjbauer

I would like to help too. This is very promising

Чт, 7 дек. 2023 г. в 22:01, Philipp Bauer @.***>:

Looking forward to the change! Just discussed how this will help significantly to solve a use-case for our internal application. Do you have it available in your fork already? I'd like to pull it into mine and play around with it.

— Reply to this email directly, view it on GitHub https://github.com/SciSharp/LLamaSharp/issues/340#issuecomment-1845504668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKNDOSY2R4WHTFQILPOCQTYIHK5DAVCNFSM6AAAAABACXJGWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVGUYDINRWHA . You are receiving this because you authored the thread.Message ID: @.***>

vshapenko avatar Dec 07 '23 15:12 vshapenko

I will try to finish the library a little more this weekend before make this public. Until know, only tested on osx-arm64.

SignalRT avatar Dec 07 '23 22:12 SignalRT

Cool, thank you! I'm on osx-arm64 and can do some testing (have my colleague do it perhaps too) after implementing it in our app.

philippjbauer avatar Dec 07 '23 23:12 philippjbauer

Is there any update on this? One Use case is OCR.

Thanks, Ash

AshD avatar Jan 30 '24 17:01 AshD

I have a branch in my fork with part of the changes. Build binaries, include runtime, etc. Those are things that I don't have done before. I first make the development with manually build binaries.

With the first versions of January, it worked.

With this prompt: What is unusual about this image? and this picture:

image

This is the output:

image

But since PR #445 it crashes on llama.cpp. I'm trying to identify the root cause and get this working again.

SignalRT avatar Jan 30 '24 21:01 SignalRT

Thanks for the update @SignalRT

Happy to test this when you have it working.

AshD avatar Jan 30 '24 21:01 AshD

@SignalRT If there is any progress with this, I would be very glad to evaluate.

Btw, where is the code repo with the working branch you mentioned above?

dcostea avatar Feb 06 '24 13:02 dcostea

I have also tried to do it and have a very simple demo. I‘m trying to rewrite my code using llamasharp.

The demo is like this: Prompt: describe the image in detail. Input Image: test

Output: output

IntptrMax avatar Feb 23 '24 00:02 IntptrMax

I will work this weekend to try to publish my work. The work will be in my branch: https://github.com/SignalRT/LLamaSharp/tree/MultiModal until PR.

SignalRT avatar Feb 23 '24 06:02 SignalRT

@SignalRT For now I switched to plan B, OllamaSharp, but I'm happy to hear that I will be able to switch back soon. Good luck!

dcostea avatar Feb 23 '24 08:02 dcostea

@SignalRT please put the code you have (LLava) into your branch that we can help you finalizing it (better calling it LLava instead of LLavaSharp).

zsogitbe avatar Feb 26 '24 14:02 zsogitbe

@IntptrMax, I have looked at your example. It is a very good attempt! I have noticed a few bugs with marshaling the cpp output, for example,

public extern static llava_image_embed llava_image_embed_make_with_filename(IntPtr clip_ctx, int n_threads, string image);

should be changed to

public extern static IntPtr llava_image_embed_make_with_filename(IntPtr clip_ctx, int n_threads, string image);

because cpp returns a pointer to the structure and if you do not do this, then you will get some random problems... You can marshal the IntPtr like this:

llava_image_embed image_embed = (llava_image_embed)Marshal.PtrToStructure(intptr_to_image_embed, typeof(llava_image_embed));

where intptr_to_image_embed is the output from llava_image_embed_make_with_filename. Maybe look at all of your functions and make sure that the marshaling is OK everywhere (I did not check all).

Also, there is a problem with the context size. If the number of tokens in the image embedding is higher, then the context size (n_ctx), then the program will crash in the function llava_eval_image_embed. For example, the default 2048 will not work using my model with your example image because it has 2880 tokens! We need to adjust the contexts size based on the model.

zsogitbe avatar Feb 27 '24 07:02 zsogitbe

@IntptrMax, I have quickly corrected your example and it seems that if we use llava_image_embed_make_with_filename with the above correction and a higher context size (4096), then it works:

Screenshot 2024-02-27 091013

zsogitbe avatar Feb 27 '24 08:02 zsogitbe

@zsogitbe Thanks a lot! I have get the same problem when evaluating several images, that's a good idea to solve it.

IntptrMax avatar Feb 27 '24 08:02 IntptrMax

I need a minimum context size of = image embedding size (2880 tokens) + batch size (512). In your example 2880+512 = 3392! The image embedding size depends on the model!

zsogitbe avatar Feb 27 '24 08:02 zsogitbe

PR with first draft: #555

martindevans avatar Feb 28 '24 23:02 martindevans

The first PR to build llava binaries #556

SignalRT avatar Mar 01 '24 05:03 SignalRT

I have tried to add llava to llamaSharp and it can work, but still have to improve. My demo is https://github.com/IntptrMax/LLamaSharp/tree/add_llava

IntptrMax avatar Mar 05 '24 09:03 IntptrMax

It works IntptrMax, I have tested it, but there is a memory leak. In my trial 1.8GB GPU memory is not freed. Try to find how to free the GPU memory because 1.8 GB is too much. Try to add some checks at the end that GPU memory is properly freed. One of the extra releasing options is llama_grammar_free(ctx_sampling.grammar)

zsogitbe avatar Mar 05 '24 10:03 zsogitbe