bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

I ran a NF4 72B model in 2xA6000 using llamafactory

Open charleswg opened this issue 1 year ago • 0 comments

System Info

For some reason, it seems really slow. I checked my CPU usage is quite high (100%) but my GPU are half loaded in VRAM and they're reporting usage.

Reproduction

When loading bnb, it's saying I explicitly load the 124, which matched my nvidia toolkit version. Is it just really slow or am I running the BNB part in cpu? How can I check?

Expected behavior

It's like 2 token/s slow. I expect running it in GPU, should be much faster.

charleswg avatar Oct 15 '24 22:10 charleswg