hardware-effects icon indicating copy to clipboard operation
hardware-effects copied to clipboard

Additional hardware effects ideas

Open Kobzol opened this issue 6 years ago • 8 comments

  • [x] non-temporal stores
  • [x] multiple threads saturating the memory bus
  • [x] hardware prefetching with indexed accesses
  • [x] floating point handling (denormals etc.)
  • [x] 4k aliasing
  • [x] store buffer capacity
  • [ ] instruction cache misses
  • [ ] TLB misses
  • [ ] more multithreading examples (lock contention etc.)
  • [ ] vector instructions
  • [ ] critical word load
  • [ ] CUDA examples

Kobzol avatar Nov 19 '18 15:11 Kobzol

some hardware effects I have come across:

  • [ ] loop optimized (or not) by Loop Stream Detector instruction queue replay
  • [ ] loop/branch misalignment
  • [ ] macro-fusable ops split on cache line boundary (Intel Core Architectures, Nehalem and newer)

martisch avatar Nov 19 '18 15:11 martisch

@martisch The third one sounds extra juicy :) But also very CPU-specific I guess. If you have more specific ideas on how to demonstrate those effects, please do share :)

Kobzol avatar Nov 19 '18 15:11 Kobzol

FWIW through testing I haven't been able to find any evidence of critical word first (CWF) on modern processors, but I would be very interesting in any test that shows it.

I can't really wrap my mind around how CWF would actually work on a system that has a 64-byte bus between L2 and L1 (like Skylake and later Intel CPUs): this implies that the entire 64-byte cache line goes from L2 to L1 in a single transfer, so no word is "first": they all arrive at the same time.

Even with smaller buses, like 16 or 32 bytes, it seems like the opportunity for CWF is very limited: probably only a cycle difference between the first and second half.

I asked on RWT about whether CWF is still used - but I got both "yes" and "no" answers and no solid conclusion.

travisdowns avatar Nov 22 '18 17:11 travisdowns

I spent a few hours yesterday trying to simulate it, without success - but I'm no expert :) The list here are just random ideas/keywords taken from the web, I have no idea whether some of them can be demonstrated consistently at all.

I would expect the RAM controller to reorder some stuff it sends to the CPU, no idea if it's done in the caches. I think that it was mentioned in The memory paper (https://akkadia.org/drepper/cpumemory.pdf), but I don't remember it exactly.

Kobzol avatar Nov 22 '18 17:11 Kobzol

One effect you might consider is demonstrating store buffer capacity.

I tried this (https://github.com/nicknash/GuessStoreBuffer - pretty horrible sorry!), and wrote what I understood to be going on at my blog (https://nicknash.me/2018/04/07/speculating-about-store-buffer-capacity/) - any corrections very welcome!

I could code up much neater C++ version if you like.

nicknash avatar Jan 16 '19 09:01 nicknash

@nicknash Hi, sorry for the late response. That is an awesome article and experiment! If you could prepare a C++ version similar to the ones that are already in this repo, I'd be happy to merge it. Please create a PR if you're interested and we can discuss it there.

Kobzol avatar Jan 19 '19 09:01 Kobzol

@nicknash I took the liberty of adding this example myself, I mentioned your comment and blog post (which is very cool BTW :) ). https://github.com/Kobzol/hardware-effects/commit/c2627f838d6fa788866982cc9412c15fe5dcc4b6

Kobzol avatar May 17 '19 15:05 Kobzol

@kobzol, that’s cool! It has been sitting on my todo list for much too long.

nicknash avatar May 17 '19 18:05 nicknash