Devin Matthews

Results 264 comments of Devin Matthews

It still makes sense to have `JR_NT > 1` with private L2 caches in some (many?) cases. The increase in cache coherency traffic can be offset by the savings of...

You always get `IC_NT` independent panels of B...right?

Er, I mean 'JC_NT'. But the amount of sharing is less.

Right, instead of one panel shared across 8 threads if we had `jr_nt = 2`.

I am badly explaining what was told to me by @tlrmchlsmth some time ago. But the benchmark is the ultimate authority.

Yes I imagine relying on cache coherency performance on ARM is a problem.

Intel has very fast communication between private caches through the ring buffer (mesh network on newer chips). I highly doubt ARM can hold a candle to it.

Also note that these settings are (or are supposed to be) per-configuration.

@fgvanzee I'll look at the changes in more detail. This was dicussed in #437 and only affects cases where `BLIS_JR_NT` is probably too large.

Yeah, I'm not sure I'm comfortable with this. I really think the better answer is just to not use so many threads in the JR loop. @hominhquan Is there a...