Faisal Zaghloul comments

Results 13 comments of


                                            Faisal Zaghloul

Threadpool: take 2

Here are some perf figures: On W-2225 Xeon machine: CPU backend: | CPU | Model | Test | t/s master | t/s threadpool | Speedup | |:--------------------------------------|:--------------|:-------|-------------:|-----------------:|----------:| | Intel(R) Xeon(R)...

Threadpool: take 2

@slaren Threadpool is back! Updated it a bit to be aligned with the latest graph-compute design. The current performance is largely on par with OpenMP. Please lmk if you have...

Threadpool: take 2

> I tried to test this on macOS, but it seems to deadlock. Fixed!

Threadpool: take 2

On M2 Max: (GGML_NO_METAL=1 GGML_NO_ACCELERATE=1) | CPU | Model | Threads | Test | t/s master | t/s threadpool | Speedup | |:------|:--------------|----------:|:-------|-------------:|-----------------:|----------:| | | llama 7B Q4_0 | 4...

Threadpool: take 2

Threadpool: take 2

@slaren lmk if it works for you this time

Threadpool: take 2

> I tested this again on the M3 Max, but it still seems to deadlock. These are the call stacks of the threads: > > ``` > (lldb) bt all...

Threadpool: take 2

> > I tested this again on the M3 Max, but it still seems to deadlock. These are the call stacks of the threads: > > ``` > > (lldb)...

Threadpool: take 2

> Thanks, I was able to run it now. Unfortunately the results are still not very good on my system. Under WSL this threadpool is much slower than OpenMP. A...

Threadpool: take 2

Edit: I totally forgot that GGML_OPENMP is disabled only for cmake builds... So the numbers below are openmp only. (interesting that there is any change at all...) @slaren @max-krasnyansky latest...