Xingwei Tan comments

Repositories
Issues
Comments

Results 4 comments of


                                            Xingwei Tan

Ollama current/stable (or built from source) appears broken on AMD MI300x ROCm gfx942

I'm dealing with the same issue with (single) MI300X and ROCm 6.2.1 The issue is likely related to the quantization. The models work completely fine when I load the **fp16**...

Ollama current/stable (or built from source) appears broken on AMD MI300x ROCm gfx942

> Single or multi-GPU? Those errors occurred when I was using one MI300X

Is AirLLM faster than llama.cpp?

Based on my experience, AirLLM is much slower. It has low VRAM usage, which cannot fully utilize the available resources. I currently haven't noticed any way to change that.

low vram usage

I have the same question. Is it possible to add a parameter to control how much VRAM we intend to use which maybe speed up the inference.