FreedomGPT icon indicating copy to clipboard operation
FreedomGPT copied to clipboard

Use as many threads as possible

Open ItsPi3141 opened this issue 2 years ago • 12 comments

faster generation! 😁👍

ItsPi3141 avatar Apr 22 '23 19:04 ItsPi3141

not sure why you're still checking for sysThreads == 4 but code-wise it looks OK

I'm doing that so that when there are 4 threads, it will use all 4 threads instead of just 2. For higher number of threads, the performance is better if not all threads are used. E.g. 4/8 threads is faster than 8/8 threads. However, when comparing speed between 2/4 threads and 4/4 threads, using all 4 threads is still faster even though it slows the system down a bit. This is why an exception has to be made for a system with 4 threads.

ItsPi3141 avatar Apr 22 '23 22:04 ItsPi3141

in this case you should use i <= sysThreads. then if sysThreads = 4 it would also go into the loop. i think your target is setting the threads to be ran to the number of CPUs on the system, but with being power of 2. but in this case if you system CPUs are 8 or 16 it would also not go into the loop and you'd need to check that separately.

Kokujou avatar Apr 23 '23 06:04 Kokujou

in this case you should use i <= sysThreads. then if sysThreads = 4 it would also go into the loop. i think your target is setting the threads to be ran to the number of CPUs on the system, but with being power of 2. but in this case if you system CPUs are 8 or 16 it would also not go into the loop and you'd need to check that separately.

No, it works exactly as intended. If it has 16 threads, it should use 8 out of 16. If it has 8 threads it should use 4 out of 8. If it has 4 threads, it should use all 4.

ItsPi3141 avatar Apr 23 '23 06:04 ItsPi3141

why? shouldn't it always use all of them?

Kokujou avatar Apr 24 '23 14:04 Kokujou

why? shouldn't it always use all of them?

No, because according to the tests that me and some other people conducted last month, using all of the threads basically makes the OS itself (specifically Windows' explorer.exe) lag and also make the LLM inference lag along with it also. Its best to not use all of the threads.

ItsPi3141 avatar Apr 24 '23 14:04 ItsPi3141

does it need to be power of two? becasue according to that information i could imagine two ways:

  1. make the process priority low, so explorer.exe has priority and get's assigned all the calculation power it needs
  2. use max-1 threads

Kokujou avatar Apr 24 '23 16:04 Kokujou

does it need to be power of two? becasue according to that information i could imagine two ways:

  1. make the process priority low, so explorer.exe has priority and get's assigned all the calculation power it needs
  2. use max-1 threads

It does need to be a power of 2. Don't know how I could make it low priority. But it will probably give worse performance if other processes are given priority.

ItsPi3141 avatar Apr 24 '23 17:04 ItsPi3141

not neccessarily if i'm well enough informed. i mean this PR can be merged anyways because no matter what's decided afterwards it's an improvement. if it doesn't need to be power-of-two, then it'd probably be easier to make sysCount -1 or sysCount -2 if you want to be sure.

if you wanted to be fancy you could even do steps in which you decrease the count by an additional +1. e.g. from 5-10 it's -1, from 10-20 it's -2 and so on.

sadly i'm only accustomed with c#. in C# the Process object has a PriorityClass that you could set to your destined priority. for everythign else you could consult chatGPT :D

Kokujou avatar Apr 24 '23 19:04 Kokujou

not neccessarily if i'm well enough informed. i mean this PR can be merged anyways because no matter what's decided afterwards it's an improvement. if it doesn't need to be power-of-two, then it'd probably be easier to make sysCount -1 or sysCount -2 if you want to be sure.

if you wanted to be fancy you could even do steps in which you decrease the count by an additional +1. e.g. from 5-10 it's -1, from 10-20 it's -2 and so on.

sadly i'm only accustomed with c#. in C# the Process object has a PriorityClass that you could set to your destined priority. for everythign else you could consult chatGPT :D

It DOES need to be a power of two though. Its been tested and determined that that's how it works.

ItsPi3141 avatar Apr 24 '23 20:04 ItsPi3141

oh sorry, i read "doesn't" >.< so sorry about that. yeah, then maybe a later investigation about process priority could help boosting the performance even more. maybe at a later stage.

i hope i didn't bother you too much sharing my thoughts, and thanks for explaining a bit on how this stuff works :)

Kokujou avatar Apr 25 '23 05:04 Kokujou

oh sorry, i read "doesn't" >.< so sorry about that. yeah, then maybe a later investigation about process priority could help boosting the performance even more. maybe at a later stage.

i hope i didn't bother you too much sharing my thoughts, and thanks for explaining a bit on how this stuff works :)

Haha dw you didn't bother me

ItsPi3141 avatar Apr 25 '23 05:04 ItsPi3141

umm... as nothing's happening on this thread, is there any possibliity you could provide me with a build or so for the version including this cherry pick? i tried checking out this stuff by myself and after 2 hours of only getting weird errors i gave up.

Kokujou avatar May 13 '23 14:05 Kokujou