LLamaSharp Any plans to bring llava support?

Hi! Are there any plans to support llava as well? As i see, it is merged to llama.cpp about month ago and gives possibility to work with image recognition aswell.

Dec 01 '23 12:12 vshapenko

I have a prototype working. I will see if I can clean and finish the project this weekend.

Dec 01 '23 13:12 SignalRT

I have a prototype working. I will see if I can clean and finish the project this weekend.

Hello! Are there any news?

Dec 05 '23 10:12 vshapenko

Looking forward to the change! Just discussed how this will help significantly to solve a use-case for our internal application. Do you have it available in your fork already? I'd like to pull it into mine and play around with it.

Dec 07 '23 15:12 philippjbauer

I would like to help too. This is very promising

Чт, 7 дек. 2023 г. в 22:01, Philipp Bauer @.***>:

Looking forward to the change! Just discussed how this will help significantly to solve a use-case for our internal application. Do you have it available in your fork already? I'd like to pull it into mine and play around with it.

— Reply to this email directly, view it on GitHub https://github.com/SciSharp/LLamaSharp/issues/340#issuecomment-1845504668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKNDOSY2R4WHTFQILPOCQTYIHK5DAVCNFSM6AAAAABACXJGWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVGUYDINRWHA . You are receiving this because you authored the thread.Message ID: @.***>

Dec 07 '23 15:12 vshapenko

I will try to finish the library a little more this weekend before make this public. Until know, only tested on osx-arm64.

Dec 07 '23 22:12 SignalRT

Cool, thank you! I'm on osx-arm64 and can do some testing (have my colleague do it perhaps too) after implementing it in our app.

Dec 07 '23 23:12 philippjbauer

Is there any update on this? One Use case is OCR.

Thanks, Ash

Jan 30 '24 17:01 AshD

I have a branch in my fork with part of the changes. Build binaries, include runtime, etc. Those are things that I don't have done before. I first make the development with manually build binaries.

With the first versions of January, it worked.

With this prompt: What is unusual about this image? and this picture:

This is the output:

But since PR #445 it crashes on llama.cpp. I'm trying to identify the root cause and get this working again.

Jan 30 '24 21:01 SignalRT

Thanks for the update @SignalRT

Happy to test this when you have it working.

Jan 30 '24 21:01 AshD

@SignalRT If there is any progress with this, I would be very glad to evaluate.

Btw, where is the code repo with the working branch you mentioned above?

Feb 06 '24 13:02 dcostea

I have also tried to do it and have a very simple demo. I‘m trying to rewrite my code using llamasharp.

The demo is like this: Prompt: describe the image in detail. Input Image: test

Output: output

Feb 23 '24 00:02 IntptrMax

I will work this weekend to try to publish my work. The work will be in my branch: https://github.com/SignalRT/LLamaSharp/tree/MultiModal until PR.

Feb 23 '24 06:02 SignalRT

@SignalRT For now I switched to plan B, OllamaSharp, but I'm happy to hear that I will be able to switch back soon. Good luck!

Feb 23 '24 08:02 dcostea

@SignalRT please put the code you have (LLava) into your branch that we can help you finalizing it (better calling it LLava instead of LLavaSharp).

Feb 26 '24 14:02 zsogitbe

@IntptrMax, I have looked at your example. It is a very good attempt! I have noticed a few bugs with marshaling the cpp output, for example,

public extern static llava_image_embed llava_image_embed_make_with_filename(IntPtr clip_ctx, int n_threads, string image);

should be changed to

public extern static IntPtr llava_image_embed_make_with_filename(IntPtr clip_ctx, int n_threads, string image);

because cpp returns a pointer to the structure and if you do not do this, then you will get some random problems... You can marshal the IntPtr like this:

llava_image_embed image_embed = (llava_image_embed)Marshal.PtrToStructure(intptr_to_image_embed, typeof(llava_image_embed));

where intptr_to_image_embed is the output from llava_image_embed_make_with_filename. Maybe look at all of your functions and make sure that the marshaling is OK everywhere (I did not check all).

Also, there is a problem with the context size. If the number of tokens in the image embedding is higher, then the context size (n_ctx), then the program will crash in the function llava_eval_image_embed. For example, the default 2048 will not work using my model with your example image because it has 2880 tokens! We need to adjust the contexts size based on the model.

Feb 27 '24 07:02 zsogitbe

@IntptrMax, I have quickly corrected your example and it seems that if we use llava_image_embed_make_with_filename with the above correction and a higher context size (4096), then it works:

Screenshot 2024-02-27 091013

Feb 27 '24 08:02 zsogitbe

@zsogitbe Thanks a lot! I have get the same problem when evaluating several images, that's a good idea to solve it.

Feb 27 '24 08:02 IntptrMax

I need a minimum context size of = image embedding size (2880 tokens) + batch size (512). In your example 2880+512 = 3392! The image embedding size depends on the model!

Feb 27 '24 08:02 zsogitbe

PR with first draft: #555

Feb 28 '24 23:02 martindevans

The first PR to build llava binaries #556

Mar 01 '24 05:03 SignalRT

I have tried to add llava to llamaSharp and it can work, but still have to improve. My demo is https://github.com/IntptrMax/LLamaSharp/tree/add_llava

Mar 05 '24 09:03 IntptrMax

It works IntptrMax, I have tested it, but there is a memory leak. In my trial 1.8GB GPU memory is not freed. Try to find how to free the GPU memory because 1.8 GB is too much. Try to add some checks at the end that GPU memory is properly freed. One of the extra releasing options is llama_grammar_free(ctx_sampling.grammar)

Mar 05 '24 10:03 zsogitbe

LLamaSharp LLamaSharp copied to clipboard

Any plans to bring llava support?

LLamaSharp
LLamaSharp copied to clipboard