Sergey "Shnatsel" Davidoff

Results 445 comments of Sergey "Shnatsel" Davidoff

I made a very basic [prototype implementation](https://github.com/Shnatsel/rust-audit) of this a while ago and opened an RFC for Cargo: https://github.com/rust-lang/rfcs/pull/2801

The implementation of `BorshSerialize` specifically should probably just coerce TinyVec to a slice and serialize that. Deserialization, on the other hand, is trickier. Note that it's also possible to implement...

> Typically the "cross-platform" way to do something like that would be to repeat the 4 `u8s` across the width of a vector. Yes, this is exactly what I'm trying...

Ignoring endianness for now, the following code seems to do roughly what I want: ```rust let mask_u32 = u32::from_bytes(mask); let pattern = faster::u32s(mask_u32).be_u8s(); buf.simd_iter(u8s(0)).simd_map(|v| v ^ pattern).scalar_collect().truncate(buf.len()) ``` However, this...

The input can be almost arbitrarily large and is not guaranteed to fit on the stack, so it's going to be either a heap allocation or an in-place mutation. FWIW...

Nope, plain old FX-4300. I'm on nightly compiler obviously, so codegeneration might vary from version to version. Compiler version I've tested this on is `rustc 1.28.0-nightly (60efbdead 2018-06-23)`. I'm benchmarking...

https://github.com/Shnatsel/tungstenite-rs/tree/mask-simd - it's under benches/ in branch `mask-simd`. I've been working on it in tungstenite codebase (websocket protocol implementation in Rust), but it's a self-contained file and can probably be...

Also, with SIMD disabled I get performance roughly equal to the naive per-byte XOR - `apply_mask_fallback()` in that file. The polyfill does not have to be that slow - for...

Turns out AVX requires a switch to a higher power consumption state and this takes time; until that happens, it runs at significantly lower frequencies. So using AVX is not...

Yeah, that's not unexpected. The good news is that if the change of backend makes a difference, that means we're not bottlenecked by parallelization overhead or by some single-threaded task....