ccl
ccl copied to clipboard
Maybe stop generating `rep ret` on x86
Note from recent AMD software optimization manual (e.g., https://developer.amd.com/wp-content/resources/56305_SOG_3.00_PUB.pdf)
2.8.1.3.2 REP RET For prior processor families, such as Family 10h and 12h, a three-byte return-immediate RET instruction had been recommended as an optimization to improve performance over a single-byte near-return. For processor Families 15h, 16h, and 17h this is no longer recommended and a single- byte near-return (opcode C3h) can be used with no negative performance impact. This will result in smaller code size over the three-byte method. For the rationale for the former recommendation, see section 6.2 in the Software Optimization Guide for AMD Family 10h and 12h Processors.
Family 10h is the K10 family, which was apparently discontinued in 2012 (per https://en.wikipedia.org/wiki/AMD_10h).