text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Multiprocess, Multithreading and Paralelism

Open Ph0rk0z opened this issue 1 year ago • 2 comments

So I notice that when CPU offloading there is only one core used. This seems like a bottle neck. On Flexgen it can use everything and the generations rival the GPU itself.

I could be wrong but this is due to the python GIL. It allows only one core to be consumed by python and htop confirms this.

They are trying to remove the lock and there is already a no-gil build of python There are some libraries and coding tricks to get around it. e.g, external modules not requiring it.

https://github.com/colesbury/nogil

It starts in 3.10 which is what we are already using. Thoughts on this? It could speed up offloading a bit?

I think it might even affect multi GPU setups because stuff like this should not be happening: https://github.com/henk717/KoboldAI/issues/295

If only a single core is ever used, the transfers probably go much slower.

Thoughts on this? Realistic or not?

Ph0rk0z avatar Mar 05 '23 18:03 Ph0rk0z

FlexGen doesn't do the calculations in the CPU, it just creates an efficient schedule for sending layers to the GPU while keeping the inactive layers in a RAM/disk cache (if I understand correctly).

I suppose the offloading strategy implemented in accelerate (which is the one used in this repository) works the same way.

oobabooga avatar Mar 05 '23 19:03 oobabooga

If I put 100 0 0 100 0 100 for flexgen it pegs all of the cores when generating at 100%

Perhaps accelerate is locked to one thread. They listed a way to use multi CPU with mpi but that might be bare metal only.

Ph0rk0z avatar Mar 05 '23 19:03 Ph0rk0z

I see that GPTQ and RWKV can use all cores so this is an accelerate problem mainly.

Ph0rk0z avatar Mar 15 '23 17:03 Ph0rk0z

Any update on this? This is currently a huge bottleneck for me.

dhcracchiolo avatar Jun 13 '23 01:06 dhcracchiolo

Hard to say. When using multiple cores for the stuff that did, it was often slower. But people report CPU bottlenecks all over the place.

Ph0rk0z avatar Jun 13 '23 13:06 Ph0rk0z

Looking forward. Why is this closed?

sieu-n avatar Jul 19 '23 04:07 sieu-n