Travis Downs
Travis Downs
> I choice set the UARCH_BENCH_CLOCK_MHZ because I can get very stable clock for Intel(add idle=poll in kernel cmdline and set 4 GHzin bios) and AMD(disable turbo clock in linux...
Just linking in some additional comments on OSX support in an [otherwise unrelated merge request](https://github.com/travisdowns/uarch-bench/pull/29#issuecomment-366808430).
> and I have another question related. According Intel, when idle=poll, that is mean the CPU keep run NOPs, will that cause the test results not right? It would not...
From [Twitter](https://twitter.com/_monoid/status/1288171343184371713): > Yeah, that's my point. If you look at load-store-load-store... chain, counters show that as loads are not replayed, the "replay horizon" is just the single store insn,...
> Ooh, and I can stick an ALU op in the dependency chain without increasing overall latency, making best-case forwarding latency 3 cycles on SNB and IVB: So this loop...
> Amazingly, yes! Yeah, I think that indicates best-case latency is 3 cycles on those chips too, better than I had thought (not that I ever tested it: it's just...
You can check out the `fwd_lat_delay_0`, `fwd_lat_delay_1` e.g,. tests in uarch-bench for an example. BTW, this effect is a cause of several "I added extra code and things _sped up_"...
BTW, if you want to run the basic store forwarding tests in uarch bench, they are available at: ./uarch-bench.sh --test-name=memory/store-fwd/* Which gives results on my Skylake like: ~~~ ** Running...
Some pointers to reading perf events in Windows with ETW: https://twitter.com/pervognsen/status/1270555130706841600 https://gist.github.com/pervognsen/73597a3a732a10922418d62c6c86a427 SO: https://stackoverflow.com/questions/45428588/can-i-read-the-cpu-performance-counters-from-a-user-mode-program-in-windows Bruce's blog: https://randomascii.wordpress.com/2016/11/27/cpu-performance-counters-on-windows/
https://twitter.com/SMT_Solvers/status/1413708136704315398 https://gist.github.com/kruxmeier/eb4becb7ba5c16192274f6fce3a47309