Wagner Bruna

Results 84 comments of Wagner Bruna

> How much value would it be if llama.cpp exported the mmap stuff as a library? I don't think it'd help that much right now. The mmap part itself is...

> Have you experimented with MMaping then copying to GPU? In my experience. I've restricted MMapping only to CPU inference & loading. MMap -> copy to GPU became a bottleneck...

You need to use the flag `--diffusion-model` instead of `-m`; see https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/z_image.md for an example.

That _does_ look a bit like a circuit board...

> SD_API void sd_set_abort_callback(sd_abort_cb_t cb, void* data); I'd avoid that name in any case, because `ggml_set_abort_callback` does something entirely different: https://github.com/ggml-org/ggml/blob/2d3876d5/src/ggml.c#L205 https://github.com/ggml-org/ggml/blob/master/include/ggml.h#L339 The unrelated `bool (*ggml_abort_callback)(void *)` pointer is indeed...

This could be caused by a badly quantized file. Are you able to compare this results with Q4 quants? With `--offload-to-cpu`, your 6G card should be able to handle it.

Interesting. I was suspecting something related to the lower quants, but at least q4_0 should work. I can't check leejet's quants right now, but at least my own q4_0 works...

> This is the command I used to generate the q4_0 weights. I confirm this quantization also works for me, both on Vukan+radv and ROCm (and as expected, with better...

Reproduced on ROCm. I'll prepare a patch.