Andrew

Results 724 comments of Andrew

OpenBLAS calls omp_get_num_threads() just once per program's lifetime and has no instrumentation for OpenMP nesting, unless you use that nested stuff properly thread count will square, openblas evicted far from...

Coyld you open a new issue. Your request cannot be satisfied with time travel 3 years back when this issue was closed. Intels documentation you refer to states unpredictability.

`-O3` takes risky optimisations around FPU, best to stick with `-O2` to avoid unpleasant eurekas. Any such change would require a rewrite of dozen years old sbrk() based internal memory...

You can disable vm overcommitment in various ways. Historically posix shm was well aligned and resident, you can use that for performance

You casted some doubt on internal allocator, which may or may not overcommit the pessimal way you illustrated.

It is artificial sample that represents memory access pattern against thin allocation and pagefaults on a still unnamed platform.

In other thread before this was posted it is said _DOT memsets while _GEMM does not. I need to do some walk around `sysctl vm.overcommit[tab]` There is a misconception it...

What is your sysctl vm.overcommit_memory value ? I get double time setting it to 1 , 0 and 2 are almost alike.

Apparently your intended optimisation attempt by using mmap() in place of malloc() does not work correctly. Python uses standard functions.

AMD piledriver is not affected by any, must be something to do with page faults vs spectre mitigations. Extra memset always makes it slower here. For comparison of memset speed...