RandomX icon indicating copy to clipboard operation
RandomX copied to clipboard

RandomX v2 virtual machine changes

Open tevador opened this issue 10 months ago • 20 comments

  • CFROUND becomes conditional with a 1/16 chance of writing into fprc
  • F and E registers are mixed together with AES instead of XOR

This PR is incomplete. Currently, only the X86 and portable versions work, hardware AES is needed with JIT and the changes are hardcoded. But it's enough to run some benchmarks.

  • [x] CFROUND changes in the interpreter
  • [x] CFROUND changes in X86 JIT compiler
  • [ ] CFROUND changes in A64 JIT compiler
  • [ ] V1/V2 selectable at runtime
  • [x] New portable intrinsics
  • [x] New X86 intrinsics
  • [ ] New ARM64 intrinsics
  • [ ] New PowerPC intrinsics
  • [x] Support for soft AES in X86 JIT compiler
  • [ ] Support for soft AES in A64 JIT compiler
  • [ ] Update documentation
  • [ ] Update tests

tevador avatar Sep 08 '23 21:09 tevador

Tested on Ryzen 7 1700 (Zen 1) with 2 threads running on the same core:

Algorithm Hashrate
RandomX 926.4 h/s
RandomX + CFROUND tweak 1004.2 h/s
RandomX v2 (CFROUND and AES tweaks) 1003.8 h/s

Summary for those who didn't read discussions on IRC:

  • CFROUND tweak makes RandomX more efficient (8.4% hashrate increase on Zen 1, expected 5-10% hashrate increase on other AMD CPUs)
  • AES tweak doubles the amount of AES computations per hash without hurting the hashrate (it uses the gap in RandomX main loop where CPU was sitting idle, waiting for scratchpad data)
  • AES tweak also introduces AES in the main RandomX loop which makes it harder for specialized hardware to get away with just a dedicated circuit for scratchpad intialization - AES must be implemented as a part of RandomX VM and work with RandomX VM's registers
  • AES tweak also improves data entropy (makes it more random) before it's written to the scratchpad

SChernykh avatar Sep 09 '23 09:09 SChernykh

RandomX V2 tests git clone https://github.com/tevador/RandomX.git cd RandomX mkdir build && cd build cmake -DARCH=native .. make

./randomx-benchmark --mine --jit --largePages --threads 2 --affinity 3 --init 16

cd .. git pull origin pull/274/head cd buid cmake -DARCH=native .. make

./randomx-benchmark --mine --jit --largePages --threads 2 --affinity 3 --init 16

threadripper 3970x Standard Performance: 1191.67 hashes per second

New: Performance: 1250.43 hashes per second

5900x Old Performance: 1525.68 hashes per second

New Performance: 1645.73 hashes per second

3900x Old Performance: 1454.24 hashes per second

New Performance: 1561.44 hashes per second

model name : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz Old Performance: 375.845 hashes per second

New Performance: 374.699 hashes per second

model name : Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz Old Performance: 1474.77 hashes per second

New Performance: 1472.4 hashes per second

Gingeropolous avatar Sep 09 '23 14:09 Gingeropolous

Ryzen 7 1700 in single thread mode: old 664.3 h/s, new 736.2 h/s.

SChernykh avatar Sep 09 '23 16:09 SChernykh

model name : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz

(this time Unthrottled)

Old Performance: 1250.65 hashes per second

New: Performance: 1225.8 hashes per second

--mine --jit --largePages --threads 1 --affinity 1 --init 16

Single thread:

Old: Performance: 655.031 hashes per second

New: Performance: 641.192 hashes per second

model name : Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz Single thread: Old Performance: 743.708 hashes per second

New: Performance: 739.415 hashes per second

Per @SChernykh suggestion, ran tests 5 times and picked highest: (for i7-7700K) Old Performance: 747.852 hashes per second

New 746.367 hashes per second on v2

Gingeropolous avatar Sep 09 '23 19:09 Gingeropolous

I implemented software AES support in the JIT compiler. To test with software AES, the following line needs to be changed:

https://github.com/tevador/RandomX/blob/356b9ff22c81d04a37bcd75cc3b60e47db7c11cc/src/common.hpp#L125

Measured with Ryzen 3700X: ./randomx-benchmark --jit --verify --softAes --largePages

Old: 15.2843 ms per hash New: 17.117 ms per hash

(Ran 5x and took the lowest result.)

So it seems there is a 10-11% performance hit for soft AES systems when doing light verification.

tevador avatar Sep 11 '23 07:09 tevador

Ryzen 9 7950X: randomx-benchmark --mine --jit --largePages --threads 2 --affinity 3 --init 32

Old 1635 h/s New 1763 h/s

And no measurable hashrate difference with and without AES tweak.

SChernykh avatar Sep 23 '23 19:09 SChernykh

@tevador Do you need help with aarch64? I can do it because I wrote that code originally, so I'm more familiar with it.

SChernykh avatar Sep 25 '23 07:09 SChernykh

Yes, it would be great if you could do the changes in the ARM64 JIT. But please wait, I realized the JitCompiler interface needs to be changed because the class cannot be a template. I'm working on a solution that would not cause cascading changes to other classes and it's a bit tricky. But I think updating the ARM assembly code should be safe for you to do now.

tevador avatar Sep 25 '23 08:09 tevador

Yes, I will only implement CFROUND and AES changes for A64 JIT compiler.

SChernykh avatar Sep 25 '23 08:09 SChernykh

Yes, it would be great if you could do the changes in the ARM64 JIT. But please wait, I realized the JitCompiler interface needs to be changed because the class cannot be a template. I'm working on a solution that would not cause cascading changes to other classes and it's a bit tricky. But I think updating the ARM assembly code should be safe for you to do now.

@tevador My WIP is here: https://github.com/SChernykh/RandomX/commits/v2 I think I found a solution for JitCompiler problem you mentioned. And I only have soft AES left to implement.

SChernykh avatar Sep 26 '23 17:09 SChernykh

macOS ARM

v2: 445.702 hashes per second v1: 424.601 hashes per second

selsta avatar Sep 26 '23 18:09 selsta

@selsta can you run each test multiple times and take the highest number for v1 and v2? ARM CPUs never run at the same speed in most devices because of power saving.

SChernykh avatar Sep 26 '23 18:09 SChernykh

I did run it multiple times, while there was some variation v2 was always faster by around 15-20h/s.

selsta avatar Sep 26 '23 18:09 selsta

Hmm, that's interesting. So Apple silicon also gets a boost (but only 5%). Is it Apple M1 or M2?

SChernykh avatar Sep 26 '23 18:09 SChernykh

M1 Pro (8 performance cores, 2 efficiency cores)

selsta avatar Sep 26 '23 18:09 selsta

@tevador aarch64 is ready to be added: https://github.com/SChernykh/RandomX/tree/v2

SChernykh avatar Sep 28 '23 07:09 SChernykh

@tevador I squashed my commits, you can just cherry-pick https://github.com/SChernykh/RandomX/commit/67d1340856f44130a950b47dd693c7c49907c167 into your PR.

SChernykh avatar Oct 05 '23 11:10 SChernykh

I can't wait for the RandomX V2 :heart:

blackmennewstyle avatar Oct 30 '23 15:10 blackmennewstyle

@tevador Do you plan to finish it soon? What is left to be done?

SChernykh avatar Nov 17 '23 08:11 SChernykh

@tevador thank you for your work on the previous and this new version of RandomX!

We're working on decentralized cloud and plan to use RandomX for CPU capacity proof of every core of a capacity provider. Looks like RandomX is the only existing ASIC and GPU resistant solution for this task. We want to launch our network in the nearest future and kinda dependent on this PR. Are there any time estimates for it? How stable is it now and can you recommend to use it for at least x86?

mikevoronov avatar Dec 23 '23 14:12 mikevoronov