[zstd][cli] Add performance counters support to bench mode
** NOT FOR LANDING**
Adding an extra parameter (-y) while running in benchmark mode to allow collecting processor performance counters, as that will allow next to know performance stats per operation (i.e. compression vs decompression).
We can collect the following performance counters using the Linux perf API: CPU cycles, instructions, branch misses, cache hits and cache misses.
One advantage of leveraging the Linux perf API is that it should work on any processor that runs Linux, therefore should work fine on x86-64 (Intel and AMD), Arm (arm32/aarch64) and RISC-V.
The counters will allow to generate new interesting stats like cycles/byte, a measure that is helpful to compare different CPU micro architectures with the benefit of being independent of clock speed.
Plus, any I/O operations (i.e. reading files from the disk) that will waste cycles displayed in a regular 'perf stat' will not be counted, since we only capture counters during the main benchmark loop.
This patch is still in its early stages as the idea is to listen to feedback and properly address its current short comings to progress towards a contribution that can be landed on zstd.