cpuminer-neoscrypt Added AVX1 support for salsa and chacha rounds

Added AVX1 support for salsa and chacha rounds

Open kangaderoo opened this issue 10 years ago • 2 comments

Code is in C for better maintainabilty. ASM derived from these files might increase speed slightly. Current speed increase compared to SSE routines about 10%

Feb 15 '15 15:02 kangaderoo

Thanks, I plan to add the AVX/XOP assembly code in the future and may use your inline assembly as a reference. SSE2 4-way is also going to be improved.

Feb 17 '15 16:02 ghostlander

I was kind of wondering where your speed increase from the 4-way is originated. Guess I still have to rewrite the the KDF compress to inline assembly. I guess this function is a good candidate to optimize to 4-way, or maybe 8-way, depending on the XMM requirements.

The original CpuMiner had a scrypt 3-way and a SHA256 4-way, resulting is the best result running a 12-way on AVX1. Scrypt 3-way contained 3 'matrices' in XMM registers, keeping 4 XMM register free for calculating functions etc. It seems that XMM//XMM operations run 3 times faster then XMM//Memory operations.

Due to the mixing behavior (4 times a 4x4 matrix) of neo-scrypt it looks like that for salsa and cha-cha 1-way would need the minimum of memory moves.

Unfortunately my development environment doesn't have AVX2, but the in-line assembly code could easily be rewritten to support the 256bits YMM registers.

John Doering schreef op 2/17/2015 om 5:14 PM:

Thanks, I plan to add the AVX/XOP assembly code in the future and may use your inline assembly as a reference. SSE2 4-way is also going to be improved.

— Reply to this email directly or view it on GitHub https://github.com/ghostlander/cpuminer-neoscrypt/pull/1#issuecomment-74694580.

Feb 18 '15 10:02 kangaderoo

cpuminer-neoscrypt cpuminer-neoscrypt copied to clipboard

Added AVX1 support for salsa and chacha rounds

cpuminer-neoscrypt
cpuminer-neoscrypt copied to clipboard