Peter Simon

Results 47 comments of Peter Simon

This is a continuation of the benchmarking I presented in #159. There, I presented the results of running the `perf/lu.jl` script of the `RecursiveFactorization` package for a Linux desktop machine....

Out of curiosity I commented out the line in the script ``` #BLAS.set_num_threads(nc) ``` and restarted Julia with `-t 8` using OpenBLAS. The result is: ![new_lu_float64_1 7 3_skylake_8cores_OpenBLAS](https://user-images.githubusercontent.com/4294361/179660974-3737d44a-b5c0-4d60-a8d7-820ccf4f9903.png) That didn't...

Why does OpenBLAS perform so much worse on Windows than on Linux?

8 threads on my system: ```julia julia> versioninfo() Julia Version 1.7.3 Commit 742b9abb4d (2022-05-06 12:58 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz WORD_SIZE: 64...

I'm happy to wait for the rewrite. However, I'm also not seeing the advertised speedup on Float64: | Algorithm | n = 200 | n = 500 | n =...

So the default algorithm was selected to be `RFLUFactorization` for the n = 200 and n=500 cases and it shouldn't be expected to be competitive for the n = 2000...

```julia julia> versioninfo(verbose=true) Julia Version 1.7.3 Commit 742b9abb4d (2022-05-06 12:58 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) "Manjaro Linux" uname: Linux 5.10.126-1-MANJARO #1 SMP PREEMPT Mon Jun 27 10:02:42 UTC 2022...

Thanks for your fast response on this issue. Looking forward to using this for complex matrices (ubiquitous in my work) in the future.

Julia was started with 8 threads. BLAS threading was set by the script: ```julia nc = min(Int(VectorizationBase.num_cores()), Threads.nthreads()) BLAS.set_num_threads(nc) ``` which looks like it would be 8 as well.

For completeness, here is the result of bench-marking after `using MKL`: ![lu_float64_1 7 3_skylake_8cores_MKL](https://user-images.githubusercontent.com/4294361/179604880-7273009e-03d5-41b0-b625-187f61e300ab.png)