Field G. Van Zee

Results 177 comments of Field G. Van Zee

@stepannassyr Please give 9bb23e6 (which is currently the head of the `dev` branch) a try and let me know if any tweaks are required before merging into `master`.

Thanks, @stepannassyr. Please keep us updated.

We have no plans at this time. But I say that literally. You may or may not have already noticed this, but some of these compound ("fused") level-2 operations, such...

> The increase in cache coherency traffic can be offset by the savings of sharing B. Can you elaborate on the savings you're referring to here? The regime I envision...

Let's assume `ic_nt` = 4 and `jc_nt` = 2. This results in two packed panels of B being created. Each panel of B would be shared across 4 threads.

I guess what I don't quite follow is this perceived benefit of threads "sharing" B.

FWIW, @dnparikh already has performance data for `trsm` on ThunderX2 (which has a private L2 cache) that shows a *big* difference between pushing all parallelism to the jr loop vs....

> Yes I imagine relying on cache coherency performance on ARM is a problem. Can you elaborate?

I couldn't gather any meaningful inferences from the data I collected on my Haswell workstation. (Not enough cores to play with.) Maybe I'm willing to punt on this issue for...

I understand. This issue was always about changing the default values to values that would work as the best starting point.