fibers icon indicating copy to clipboard operation
fibers copied to clipboard

Change push+ret in set_context to indirect jmp

Open mrakh opened this issue 3 years ago • 2 comments

This snippet of code in your set_context subroutine:

  pushq %r8
  xorl %eax, %eax
  ret

should be changed to:

  xorl %eax, %eax
  jmp *%r8

And likewise with swap_context.

Modern Intel and AMD CPU microarchitectures have a return stack buffer (RSB) that tracks call and ret invocations so they can speculatively execute past a ret instruction. A mispredicted ret will cause a guaranteed pipeline stall, which will seriously hurt your performance. By contrast, jmp *%r8 is speculated using the indirect branch predictor, which is likely to have a non-zero hit rate.

mrakh avatar Feb 25 '22 20:02 mrakh

I can confirm that in my tests on i5 650 (of just swapping between two functions on one pinned thread and counting), jmp makes the entire function 50% faster

overloader7 avatar Mar 24 '23 22:03 overloader7

https://blog.stuffedcow.net/2018/04/ras-microbenchmarks/

overloader7 avatar Apr 05 '23 15:04 overloader7