Cannot use complex `bli_scal2s` with `&x == &y`
The level0 macros bli_scal2s, bli_scal2js, bli_scal2ris, and bli_scal2jris (collectively, y = alpha*conj?(x)) all boil down to the same implementation which, for complex numbers, writes the real part of y before the real part of x is finished being read. Thus, if the same address or variable is used for both x and y, the incorrect result is computed.
Of course, scal2s with &x == &y is just scals, but there are situations where the single operand in-place version cannot be used (for example, via bli_scal2v which can conjugate the x vector, while bli_scalv cannot---I will be submitting a separate issue about this).
To be completely general, all complex level0 macros should use temporary local variables to avoid writing to the output until after all inputs have been read.
@fgvanzee if you happen to get around to making a PR for this before I do, great, but I'll do one in the next few weeks otherwise.