Andrew
                                            Andrew
                                        
                                    zlasr_ is single-threaded zhemv now is single-threaded for small inputs to avoid damage from excess threading. Yes, the performance target is not to stress-test CPU cooling, but to give result...
It is not L3 cache per core or NUMA domain, it is per socket, like 1-2MB per core, in place of haswell's 2.5MB Smaller than zen1 L1d actually matches that...
> but the memory latency is still a problem Are you serious? You know that X GHz memory server that much words per second, there is no shortcut (There is...
AMD looks like 4-core clusters ? Does it get seen in NUMA tables anywhere?
Well, not exposed but 3x faster ... It is quite important that same data does not get dragged around outer cache without need. There is sort of no software exposure,...
It is HyperTransport (intels rough equivalent is QPI). Though no idea how modern one does around clocking/powersaving etc....
First question is why would you want to distribute non-releases around. If you pull a non-release you can tag it in Makefile.rule to your liking. The information like tarball checksum,...
Present situation with some.release.id-dev popping out of pip leaves us quite helpless sometimes, actualy you can download one from git project front page without any tags inside. E.G You can...
There is just no way dual OMP mutex functions namd same could ever work. Your faulty setup is doomed to fail with or without OpenBLAS.
@lightsighter both iomp and clang omp simulate gomp symbols to act as replacements. The library you point to just permits to nest KMP with OMP, you coud nest native threads...