Zoli Somogyi comments

Results 93 comments of


                                            Zoli Somogyi

[BUG]: GenerateAsync via the IEmbeddingGenerator interface throws ObjectDisposedException on LLama.Native.SafeLLamaContextHandle

[bmazzarol](https://github.com/bmazzarol), Let's assume your benchmark is accurate and it takes 22ms to create EmbedBatch1. Of that time, the majority is likely spent on generating the embedding itself, so we can...

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

I would expect that the model keeps its kvcache GPU memory space and that it just needs to reset it without needing to reallocate it. The model should not need...

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

Everything is kept after loading the model. The kvcache is being allocated every time I do inference. This is the problem. You can test it by using 2 models which...

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

I have created a test program and I could further narrow down the problem. So, the crash occurs when you load model A, but do not use it immediately, but...

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

No, you forgot to mention the last step when it allocates GPU memory once more, when the model is used for the first time. This is not as expected. What...

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

ggml-cuda.cu crashes after this: ``` llama.dll!llama_graph_compute(llama_context & lctx, ggml_cgraph * gf, int n_threads) Line 11094 C++ llama.dll!llama_decode_internal(llama_context & lctx, llama_batch batch_all) Line 11336 C++ llama.dll!llama_decode(llama_context * ctx, llama_batch batch) Line...

Zoli Somogyi

[BUG]: GenerateAsync via the IEmbeddingGenerator interface throws ObjectDisposedException on LLama.Native.SafeLLamaContextHandle

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

[BUG]: When using large models with the GPU the code crashes with cannot allocate kvcache

Setting up a non-development runtime environment as a published .net application.

[BUG]: Drop in performance of creating embeddings for chunks