Devin Matthews
Devin Matthews
@mratsim can you send the output of `/proc/cpuinfo` or the equivalent on your platform?
@mratsim w.r.t. threading I have been meaning for some time to port BLIS to my [TCI](https://github.com/devinamatthews/tci) threading library that I use in [TBLIS](https://github.com/devinasmatthews/tblis). This library can use either thread-based (OpenMP,...
@mratsim is your CPU still misidentified? If so please send the full output of configure.
@mratsim The code is BLIS is slightly different from @jeffhammond's code. Can you test with BLIS? Configuring with `configure auto` should show that it selects the `skx2` sub-configuration.
@hominhquan by broadcast I just mean pointer broadcast. Only the main thread includes the beta*C part.
@fgvanzee I welcome your comments.
> `ii. 1 if n > n_c` I'm reconsidering this now, but my thinking was that a) you only want to parallelize over k if n is smallish anyways and...
Agreed. Drop that requirement.
@fgvanzee let's talk about this; IIRC carouseling doesn't take full advantage of parallelism in both dimensions.
Tuesday then.