Diego Devesa

Results 361 comments of Diego Devesa

I don't think that we launch enough kernels for this to make a meaningful difference.

Hi @agray3, you can fork the project in github, and push the branch to your fork. Then you will have the option to open a PR from the changes in...

We allocate all the KV memory required for the maximum context length on startup in one block, so we shouldn't have any fragmentation either.

I think the ifdefs are unnecessary because both compile to the same instructions ([see this in godbolt](https://godbolt.org/z/nvhv1bT1v)). You could simply use the `_mm256_insertf128_si256` version everywhere.

@cotwitch If you look at that model with `gguf-dump.py`, you will see that it has the tensor `output.weight` duplicated. Not sure how that happened, but that's not a valid model.

Sorry, I do not have any insights about how that may have happened. I guess it is a bug in the conversion script, and the gguf-py library should have prevented...

``` q4_3 42.94 seconds per pass - ETA 7.81 hours prompt eval time = 54411.09 ms / 631 tokens ( 86.23 ms per token) bs=512 prompt eval time = 59126.51...

Adding my first impressions here as well. I had some compile errors in my system: ``` stable-diffusion.cpp/stable-diffusion.cpp: In function ‘void copy_ggml_tensor(ggml_tensor*, const ggml_tensor*)’: stable-diffusion.cpp/stable-diffusion.cpp:171:5: error: ‘memcpy’ was not declared in...

I am using `gcc (Ubuntu 12.3.0-1ubuntu1~23.04) 12.3.0`, which should be the current version of GCC in Ubuntu-latest.

The behavior is different depending on the GPU backend being used. Since it is a mali GPU, I assume that you are using OpenCL, is that correct?