bitsandbytes
bitsandbytes copied to clipboard
I ran a NF4 72B model in 2xA6000 using llamafactory
System Info
For some reason, it seems really slow. I checked my CPU usage is quite high (100%) but my GPU are half loaded in VRAM and they're reporting usage.
Reproduction
When loading bnb, it's saying I explicitly load the 124, which matched my nvidia toolkit version. Is it just really slow or am I running the BNB part in cpu? How can I check?
Expected behavior
It's like 2 token/s slow. I expect running it in GPU, should be much faster.