Michael Yeh
Michael Yeh
Running the testsuite, it looks like `s` and `d` are passing while `c` and `z` are failing. For `c` and `z`, when I change the register allocation from `fa4`, `fa5`...
After looking at the objdump, it looks like the compiler is using `fa4` and `fa5` in some branches involving floating point comparisons. To prevent this issue from reappearing, I would...
@nick-knight I originally tried just adding the output register to the clobber list of any floating-point load, e.g. ``` __asm__(FLT_LOAD "fa5, %1(%0)" : : "r"(alphaw), "I"(FLT_SIZE) : "fa5"); ``` But...
@devinamatthews How would you like to proceed? There are a few short-term solutions we discussed above. Longer-term, I'd like to rewrite the inline assembly files to be more robust (probably...
Mind if we sync up in a week or two? I'll start working on it this week and hopefully by then I'll have a sense of how much more time...
@devinamatthews I'm steadily working through cleaning up all the kernels, but I don't think I'll be able to finish it in the next two weeks. I'm also trying to balance...
@devinamatthews Thanks for the hard work! I saw only one thing, which is to call the vectorized packing kernels when `cdim