POWER micro-architecture dispatch
This implements a Linux-specific, compiler-neutral selection, unlike the (possibly?) gcc-specific simple one you didn't like. It currently only affects the compilation of generic code.
Note that there's a strange problem I've had to kludge round by lowering the optimization levels noted in a log message, but it seems worth recording this now. I can only try with Fedora 28's gcc 8.3 (on power8) currently, but will try to get the system updated to a supported Fedora. I might be able to get access to power9, but haven't tried.
This was branched from the arm64 equivalent as there originally was a merge clash with bits I removed.
I should have referenced #322.
@loveshack Thanks for your efforts, Dave. This is an unusually busy time for us, so it will be a bit before I can take a look at these modifications.
In the meantime, if you aren't already doing so, I would suggest looking into the AppVeyor failures to see if those are related to the commits on this PR branch or if (perhaps less likely) they are being caused by something else (environment/hardware changes at AppVeyor, for example).
I fixed the failure, after confusion why it seemed MS Windows-specific. I don't know how the export spec got lost.
@loveshack there were several conflicts that I resolved but I think probably it needs some retesting to be sure I didn't break anything.
@fgvanzee we should take another look at this, especially if @nicholaiTukanov's Power9 code is in a finished state.
By the way, OpenBLAS has a POWER10 kernel using its matrix-multiply feature, in case anyone is interested, not that the hardware is available yet.
Somehow, notification of Devin's reviews made it into my inbox. Anyhow, while I'm here...