baize-chatbot very high CPU during inference. GPU seems to be idle.

very high CPU during inference. GPU seems to be idle.

Open xuduo18 opened this issue 2 years ago • 1 comments

I have tried the 8bit option as well but no change.

It generates tokens slowly and CPU goes high (>80%). GPU jumps up too but always < 20%. So it seems to be CPU hungry instead of GPU.

So by default does it inference on GPU?

Apr 21 '23 21:04 xuduo18

This seems to be a problem with int8. In our test, it is indeed slower than fp16. We'll have an investigation into this.

Apr 22 '23 03:04 JetRunner