Artyom Pavlov
Artyom Pavlov
Could you benchmark the parallel implementation and compare it against the single threaded one?
I don't have a strong opinion here, so I am fine with either, but my minor concern here is that `rayon`-based multithreading may not be always an appropriate option. Imagine...
Oh, you are right. Fixed. BTW do we really need the 64 byte alignment in the first place? IIUC this alignment is too strict for SIMD vectors and it looks...
Have you tried to directly `mmap` the memory?
>I'm completely unsure what the other arguments to the macro are ("ymm"/"zmm"?) It's an additional check for the register class support (see #793). IIUC avxfma and VSHA512 instructions work over...
>for the single block case, we still basically want to fall back to the NI implementation, don't we? Maybe for maintenance or structural reasons it would be cleaner to just...
>This version uses separate backend definitions in order to avoid rebroadcasting the key from 128b -> 512b for each call to encrypt/decrypt. I don't think it's worth to store broadcasted...
The main reason is that it would quadruple the size of `Aes*` states. Even worse, with enabled autodetection it would affect targets without AVX-512 (remember that we use `union` in...
>The only thing that really changed was that I moved the call to .map on [__m128i; N] array from the expand_key and inv_expanded_keys functions to the get_enc_backend and get_dec_backend functions....
@silvanshade Yes, I think it's better to experiment with minor modifications in separate PRs. I will try to fully review the code this week (likely during weekend) and probably will...