Ben Jackson comments

Results 21 comments of


                                            Ben Jackson

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention

With this patch, on gfx908 I see prompt performance with FA=1 almost exactly equal to FA=0 (without the patch, FA=1 is both slower and scales worse). Token generation is somewhere...

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention

Here's the output: ``` ~/hjc4869-llama.cpp$ ./build/bin/test-backend-ops -o FLASH_ATTN_EXT ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Instinct MI100, gfx908:sramecc+:xnack- (0x908), VMM: no, Wave Size:...

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention

Maybe this is obvious, but the failing cases above are just all of the cases where: ``` #if defined(GGML_HIP_ROCWMMA_FATTN) if (fp16_mma_available(cc) && dst->src[0]->ne[1] > 8) { ggml_cuda_flash_attn_ext_wmma_f16(ctx, dst); return; }...

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention

I pulled the latest changes and I can confirm that the test doesn't crash, and now the garbage output appears even at small prompt sizes. It also improved (garbage) token...

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention

I must preface this by saying: I have no idea what I'm doing. But I think the issue is that CDNA wave size is 64 (see https://rocm.docs.amd.com/projects/rocWMMA/en/latest/api-reference/api-reference-guide.html ). By "test...

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention

@IMbackK I'm happy to try to clean up this diff, but I don't know anything about llama.cpp internals (or CUDA, or...), so I'd need some coaching. Here's how I approached...

[Bug]: Quantized models - NotImplementedError: Could not run '_C::machete_prepack_B'

I looked into the code when reporting #14996 and the logic for selecting kernels is fairly simplistic. My issue was that ROCm "device capability" can't be interpreted in the same...

Slim down michaelf34/infinity:latest-amd

Looks like vLLM makes its own base image by apt installing rocm pieces. But I did file an issue against rocm/pytorch, because it's strange that pytorch/pytorch does not have this...

Necessary comfyui models not installed for webui

I can confirm that the provisioning script ran. Here are the two copies of `t5xxl` on my system: ``` root@4ceeaaa86d13:/var/log# find / -name t5xxl_fp16.safetensors /workspace/ComfyUI/models/text_encoders/t5/t5xxl_fp16.safetensors /workspace/storage/stable_diffusion/models/clip/t5xxl_fp16.safetensors ``` So it looks...

Necessary comfyui models not installed for webui

If I run `/opt/ai-dock/storage_monitor/bin/storage-monitor` it makes symlinks, which made me realize that there appear to be two different comfyui installations in this container, one at `/opt/ComfyUI` and one in `/workspace/ComfyUI`....