Andrew
Andrew
It is here: https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt Search for `mitigations=` Most likley `mitigations=off` would "normalize" the performance, then bisect, i.e apply half, then other half and that way determine one doing the damage....
It is important to know which mitigation should be worked around this way. e.g AMD is fine, amd-style return trampoline or not. memset does only damage. Haswell - ??? Microcode...
So it has to be addressed conditionally, just to determine the "mitigations" combo that triggers the pessimality.....
Probably worth making into task list, sorting by absent guard and suspected wrong guard.
https://github.com/xianyi/OpenBLAS/issues/1989#issuecomment-460371387 has test case vs `dsymv` via lapack, there is no performance regression, but no thread threshold either.
@drhpc the initial idea is fairly simple to draw the line in the sand where computing switches from one to all cpus, it might be slightlt dated with more and...
I moved to other discipline of computers, If you have time, at least give it a try: * In `interfaces/` pick one that does not have thread threshold, so you...
`CHOLMOD/Demo/Readme.txt` explains that. OpenBLAS must be built typing `make`. I confirm Martin's result on early pre-avx512 skylake with 6 cores + HT , on AMD though tables turn other way...
> … is that the whole issue? Was someone using an openmp program that multiplied threads by recusively parallel OpenBLAS? Quite often, at least performance impact stated looks that spectacular....
Normally it is to run 1 and 2 and all NUM_THREADS ans compare top10 in `perf record -p ; perf report` It was sched_yield often, but with that now out...