Ben Jackson
Ben Jackson
With this patch, on gfx908 I see prompt performance with FA=1 almost exactly equal to FA=0 (without the patch, FA=1 is both slower and scales worse). Token generation is somewhere...
Here's the output: ``` ~/hjc4869-llama.cpp$ ./build/bin/test-backend-ops -o FLASH_ATTN_EXT ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Instinct MI100, gfx908:sramecc+:xnack- (0x908), VMM: no, Wave Size:...
Maybe this is obvious, but the failing cases above are just all of the cases where: ``` #if defined(GGML_HIP_ROCWMMA_FATTN) if (fp16_mma_available(cc) && dst->src[0]->ne[1] > 8) { ggml_cuda_flash_attn_ext_wmma_f16(ctx, dst); return; }...
I pulled the latest changes and I can confirm that the test doesn't crash, and now the garbage output appears even at small prompt sizes. It also improved (garbage) token...
I must preface this by saying: I have no idea what I'm doing. But I think the issue is that CDNA wave size is 64 (see https://rocm.docs.amd.com/projects/rocWMMA/en/latest/api-reference/api-reference-guide.html ). By "test...
@IMbackK I'm happy to try to clean up this diff, but I don't know anything about llama.cpp internals (or CUDA, or...), so I'd need some coaching. Here's how I approached...
I looked into the code when reporting #14996 and the logic for selecting kernels is fairly simplistic. My issue was that ROCm "device capability" can't be interpreted in the same...
Looks like vLLM makes its own base image by apt installing rocm pieces. But I did file an issue against rocm/pytorch, because it's strange that pytorch/pytorch does not have this...
I can confirm that the provisioning script ran. Here are the two copies of `t5xxl` on my system: ``` root@4ceeaaa86d13:/var/log# find / -name t5xxl_fp16.safetensors /workspace/ComfyUI/models/text_encoders/t5/t5xxl_fp16.safetensors /workspace/storage/stable_diffusion/models/clip/t5xxl_fp16.safetensors ``` So it looks...
If I run `/opt/ai-dock/storage_monitor/bin/storage-monitor` it makes symlinks, which made me realize that there appear to be two different comfyui installations in this container, one at `/opt/ComfyUI` and one in `/workspace/ComfyUI`....