DocShotgun
DocShotgun
> What's the GPU weights you are using? (fp8+GPTQ4 GPU weights from there: https://modelscope.cn/models/ApproachingAI2024/DeepSeek-V3-0324-GPU-weight/files ?). > > And thanks for your discovery and help. We are going to check this...
> Which ktransformer commit is used to run all above experiments? If you mean my testing where I ran into the slight incoherence issues with Deepseek V3 0324, I was...
> Thank you for the information. I am wondering how much CPU memory is needed to run deepseek v3 0324. Is 256 GB enough? Definitely not. For the full fp8+int8...
> To help us reproduce and look into the AttributeError when loading GPU weights, could you share more info (full error log, model/weight format, launch command, etc)? Launch command: ```...
Roger that, updated my `config.json` and added that block, keeping the `config_groups` key with the GPTQ params. Loads and infers again at around 13 T/s on my setup (dual Xeon...
Maybe the deepgemm issue is solved now, as this was around a month ago when I tested it. But regardless the solution is to just not use deepgemm since it...
https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/1807#issuecomment-2346805239
The model moving is meant to only load the needed parts of the model into VRAM when they are being used. Without it, even a 4090 has too little VRAM...
> > The model moving is meant to only load the needed parts of the model into VRAM when they are being used. Without it, even a 4090 has too...
I think it's probably better to just remove the print statements for each skipped layer, since the `resume_layer` is already printed at the start if it's relevant. Adding another option...