Devin Matthews

Results 264 comments of Devin Matthews

@jeffhammond suggestions for any specific architectures?

@hominhquan thanks for your analysis. In practice, `BLIS_IR_NT` is always 1, as threading this loop just doesn't make sense on any architecture we've seen. Without diving into the details on...

I should also add that realistically `BLIS_JR_NT

OK. Do you have any performance numbers? It sounds like we can improve general parallel performance then.

@hominhquan is the general issue that `BLIS_JR_NT` should be a divisor of `BLIS_NC/BLIS_NR`? As I mentioned before, `BLIS_JR_NT` also shouldn't be very large, maybe 4 at most.

No BLIS can't override the user's choices, but I guess then it's up to the user to not do that. I just added some notes to the Multithreading documentation.

@hominhquan just had in interesting conversation where some cases were pointed out where `BLIS_IR_NT > 1` make sense performance-wise. Does `BLIS_IR_NT = BLIS_JR_NT = 4` give you reasonable performance and...

@jeffhammond I'm not sure what you mean by this? I gather there is still a bug? but I'm not sure what

> up until now we knew all desktop Skylakes to be sans AVX-512 Except for Skylake-X i7 and i9! And Cannon Lake! And Cascade Lake!