Andrew
Andrew
zhemv has no ththreading threshold @tjoli - do you see execution time halved with single thread?
I am into thinking zlasr work is same for both cases, just that it takes 30 or 80% depending on zhemv ?waste
@martin-frbg I will @tjoli thanks, that just confirms initial suspicion. dlasr time derived from your data 28s : 32s: 34s
@martin-frbg philosophically low threshold for input is page size, for output cache line , assuming all chunks are aligned, upper bound L3cache (i.e we do not add extra thread to...
@tjoli temp fix while I get better moderation of threading is to disable it altogether in interface/zhemv.c: ``` #ifdef SMP nthreads = num_cpu_avail(); ``` becomes ``` #ifdef SMP nthreads =...
Using benchmarks included `OPENBLAS_NUM_THREADS=X ./yhemv.goto 16...16000 16 ` and ^C when seems stable zhemv saturates 1 core around 128x128 sample 2 cores 320x320 chemv 480 and 480 2-core version is...
 X=N Y=GFlops 2-thread gets above 1-thread below size of 2MB L3 cache (~1.8MB input) then it steadily saturates CPU above L3 cache size (i.e each CPU core 1is at...
Jitter made me wonder too. No cpu temperature, nothing anomalous anywhere. Could be gemm threshold, or fixed megabyte as well, at least later is closer to the truth....
The method and assumptions regarding underlying mechanism were quite simple. * determine when compute resource saturates iterating through viable _NUM_THREADS/taskset combinations. ./Xhemv.goto 32 3200 32 and see it stabilize Go...
@tjoli - can you test? It is very rough and simple change.