text-generation-webui Multiprocess, Multithreading and Paralelism

So I notice that when CPU offloading there is only one core used. This seems like a bottle neck. On Flexgen it can use everything and the generations rival the GPU itself.

I could be wrong but this is due to the python GIL. It allows only one core to be consumed by python and htop confirms this.

They are trying to remove the lock and there is already a no-gil build of python There are some libraries and coding tricks to get around it. e.g, external modules not requiring it.

https://github.com/colesbury/nogil

It starts in 3.10 which is what we are already using. Thoughts on this? It could speed up offloading a bit?

I think it might even affect multi GPU setups because stuff like this should not be happening: https://github.com/henk717/KoboldAI/issues/295

If only a single core is ever used, the transfers probably go much slower.

Thoughts on this? Realistic or not?

Mar 05 '23 18:03 Ph0rk0z

FlexGen doesn't do the calculations in the CPU, it just creates an efficient schedule for sending layers to the GPU while keeping the inactive layers in a RAM/disk cache (if I understand correctly).

I suppose the offloading strategy implemented in accelerate (which is the one used in this repository) works the same way.

Mar 05 '23 19:03 oobabooga

If I put 100 0 0 100 0 100 for flexgen it pegs all of the cores when generating at 100%

Perhaps accelerate is locked to one thread. They listed a way to use multi CPU with mpi but that might be bare metal only.

Mar 05 '23 19:03 Ph0rk0z

I see that GPTQ and RWKV can use all cores so this is an accelerate problem mainly.

Mar 15 '23 17:03 Ph0rk0z

Any update on this? This is currently a huge bottleneck for me.

Jun 13 '23 01:06 dhcracchiolo

Hard to say. When using multiple cores for the stuff that did, it was often slower. But people report CPU bottlenecks all over the place.

Jun 13 '23 13:06 Ph0rk0z

Looking forward. Why is this closed?

Jul 19 '23 04:07 sieu-n

text-generation-webui text-generation-webui copied to clipboard

Multiprocess, Multithreading and Paralelism

text-generation-webui
text-generation-webui copied to clipboard