Xingwei Tan

Results 4 comments of Xingwei Tan

I'm dealing with the same issue with (single) MI300X and ROCm 6.2.1 The issue is likely related to the quantization. The models work completely fine when I load the **fp16**...

> Single or multi-GPU? Those errors occurred when I was using one MI300X

Based on my experience, AirLLM is much slower. It has low VRAM usage, which cannot fully utilize the available resources. I currently haven't noticed any way to change that.

I have the same question. Is it possible to add a parameter to control how much VRAM we intend to use which maybe speed up the inference.