Xingwei Tan
Xingwei Tan
I'm dealing with the same issue with (single) MI300X and ROCm 6.2.1 The issue is likely related to the quantization. The models work completely fine when I load the **fp16**...
> Single or multi-GPU? Those errors occurred when I was using one MI300X
Based on my experience, AirLLM is much slower. It has low VRAM usage, which cannot fully utilize the available resources. I currently haven't noticed any way to change that.
I have the same question. Is it possible to add a parameter to control how much VRAM we intend to use which maybe speed up the inference.