Devin Matthews

Results 264 comments of Devin Matthews

You aren't doing any threading along the M dimension (`BLIS_IC_NT`)?

Is this also a memory thing? Parallelizing along the IC loop would definitely be preferable. Alternatively, since you are currently just collapsing the IR/JR loops, why not set IR_NT=4 and...

@fgvanzee what might happen if the collapsed version were used all the time?

> But if we can find a more elegant way of expressing the logic that doesn't involve so much code duplication, I'm open to considering it. This was my concern...

@decandia50 if you configure using e.g. `configure intel` then it will compile in all the Intel architectures and select the proper one at runtime. While this isn't exactly the feature...

Oh, I didn't not see that it's *reproducibility* that is the main issue. I think this feature should be relatively easy to add, but i can't hazard a guess on...

Although, on Linux `perf` is a much better tool.

@drew-parsons can you please extract the BLAS/LAPACK operation and parameters this corresponds to and construct a minimal reproducer (Fortran or C/CBLAS)?

There's definitely a mismatch in the Fortran arguments. Here is the docs if you want to take a stab a fixing it (I can't take a look until at least...

This is always the problem with Python wrappers... how feasible is it to try and get a backtrace of the segfault when called from Python?