DocShotgun

Results 32 comments of DocShotgun

Another thing to note with this, is that this hardcoded `num_warps` of 32 causes errors when trying to run on AMD Instinct accelerators, since they have a warp size of...

The hardcoded `num_warps` of 32 causes [this error](https://github.com/linkedin/Liger-Kernel/issues/231) on AMD Instinct accelerators, which I managed to avoid on my end by reducing it to 16 instead. I was subsequently able...

The model moving is meant to only load the needed parts of the model into VRAM when they are being used. Without model moving, you wouldn't be able to generate...

I just attempted to train using axolotl on an instance with 8xMI300x, torch 2.4.0+ROCm6.1 and got this error. Not sure if anyone here has gotten Liger-Kernel to run on AMD?...

The Triton version is 3.0.0. I'm also running flash-attn 2.6.3 (built for gfx942 arch on torch 2.4.0+ROCm6.1), but I'm not sure if that's relevant. Unfortunately I don't have a minimal...

This is very needed - brave as a search backend is effectively unusable because of this. I suspect that it's due to the model generating more than 1 search query...

FYI setting max concurrent requests to 1 does not prevent this error from occurring because the limit is max 1 request per *second* specifically in the backend.

Hello, and thank you for this! Finally an inference backend that can fully utilize my dual Xeon + single GPU setup. I was able to download the Deepseek-v3-0324 quants from...

> [@ovowei](https://github.com/ovowei) I guess we haven't supported RTN int4 or int8 converted from fp8, as we need to write the fp8 dequantization, which is missing. And seems the int8 support...

> There is a bug that the `kt-amx-method` choice comes from env directly, which means the command line arg is not working. But since your output is not nonsense words,...