DocShotgun comments

Results 32 comments of


                                            DocShotgun

Choice of num_warps

Another thing to note with this, is that this hardcoded `num_warps` of 32 causes errors when trying to run on AMD Instinct accelerators, since they have a warp size of...

The hardcoded `num_warps` of 32 causes [this error](https://github.com/linkedin/Liger-Kernel/issues/231) on AMD Instinct accelerators, which I managed to avoid on my end by reducing it to 16 instead. I was subsequently able...

is there any way to stop this moving model thing

The model moving is meant to only load the needed parts of the model into VRAM when they are being used. Without model moving, you wouldn't be able to generate...

Which GPUs does this work on?

I just attempted to train using axolotl on an instance with 8xMI300x, torch 2.4.0+ROCm6.1 and got this error. Not sure if anyone here has gotten Liger-Kernel to run on AMD?...

Which GPUs does this work on?

The Triton version is 3.0.0. I'm also running flash-attn 2.6.3 (built for gfx942 arch on torch 2.4.0+ROCm6.1), but I'm not sure if that's relevant. Unfortunately I don't have a minimal...

feat: option to disable parallel web search request

This is very needed - brave as a search backend is effectively unusable because of this. I suspect that it's due to the model generating more than 1 search query...

feat: option to disable parallel web search request

FYI setting max concurrent requests to 1 does not prevent this error from occurring because the limit is max 1 request per *second* specifically in the backend.

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

Hello, and thank you for this! Finally an inference backend that can fully utilize my dual Xeon + single GPU setup. I was able to download the Deepseek-v3-0324 quants from...

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

> [@ovowei](https://github.com/ovowei) I guess we haven't supported RTN int4 or int8 converted from fp8, as we need to write the fp8 dequantization, which is missing. And seems the int8 support...

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

> There is a bug that the `kt-amx-method` choice comes from env directly, which means the command line arg is not working. But since your output is not nonsense words,...