DocShotgun comments

Results 32 comments of


                                            DocShotgun

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

> What's the GPU weights you are using? (fp8+GPTQ4 GPU weights from there: https://modelscope.cn/models/ApproachingAI2024/DeepSeek-V3-0324-GPU-weight/files ?). > > And thanks for your discovery and help. We are going to check this...

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

> Which ktransformer commit is used to run all above experiments? If you mean my testing where I ran into the slight incoherence issues with Deepseek V3 0324, I was...

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

> Thank you for the information. I am wondering how much CPU memory is needed to run deepseek v3 0324. Is 256 GB enough? Definitely not. For the full fp8+int8...

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

> To help us reproduce and look into the AttributeError when loading GPU weights, could you share more info (full error log, model/weight format, launch command, etc)? Launch command: ```...

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

Roger that, updated my `config.json` and added that block, keeping the `config_groups` key with the GPTQ params. Loads and infers again at around 13 T/s on my setup (dual Xeon...

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

Maybe the deepgemm issue is solved now, as this was around a month ago when I tested it. But regardless the solution is to just not use deepgemm since it...

GGUF still not work with loras

https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/1807#issuecomment-2346805239

LOAD AND UNLOAD every time i try to create an image ?

The model moving is meant to only load the needed parts of the model into VRAM when they are being used. Without it, even a 4090 has too little VRAM...

LOAD AND UNLOAD every time i try to create an image ?

> > The model moving is meant to only load the needed parts of the model into VRAM when they are being used. Without it, even a 4090 has too...

[feat]: kt-kernel: Add resume arg to CPU weight conversion

I think it's probably better to just remove the print statements for each skipped layer, since the `resume_layer` is already printed at the start if it's relevant. Adding another option...