Daniele
Daniele
The only difference between the official master branch and yours is that load time on the official one is lower (490 ms vs 697 ms)
I set up the docker and cloned your repo inside, recompiled and still the same 0% GPU usage.
Those are the linked libraries: linux-vdso.so.1 (0x00007fff8adee000) libhipblas.so.0 => /opt/rocm/lib/libhipblas.so.0 (0x00007fd3d087f000) libamdhip64.so.5 => /opt/rocm/lib/libamdhip64.so.5 (0x00007fd3cf200000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fd3cee00000) libm.so.6 => /usr/lib/libm.so.6 (0x00007fd3cf118000) libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fd3d082a000) libc.so.6 => /usr/lib/libc.so.6...
Ok, it works. Sorry for not trying this longer prompt first but I'm quite new to this and still don't understand a lot about it. Using this longer prompt evaluation...
Vicuna 13B q4_1
Once I get home I'll test more, however that already seems great
I've noticed that in the perplexity test the HIPBLAS version is doesn't calculate anything. It hangs at 100% GPU usage and just doesn't do anything. However it seems related to...
The perplexity test works perfectly with that flag on the 5700xt
Vicuna 13B-q4_1 `no blas: 49.66 seconds per pass - ETA 9.04 hours` `hipblas: 16.44 seconds per pass - ETA 2.99 hours`
I see what you mean, just tested the generation and outputs gibberish. Also the perplexity score is different