llama: add compiler tags for cpu features
Support local builds with customized CPU flags for both the CPU runner, and GPU runners.
Some users want no vector flags in the GPU runners. Others want ~all the vector extensions enabled. Each runner we add to the official build adds significant overhead (size and build time) so this enhancement makes it much easier for users to build their own customized version if our default runners CPU: [none,avx,avx2] and GPU:[avx] don't address their needs.
Fixes #2281 Fixes #2187 Fixes #2205
Note: this wont be readily available for end-users until we merge jmorganca/llama to main in #5034
~~Switching to draft since I need to test permutations on the new dynamic runner discovery logic.~~