SIMD optimization
This crate is currently used by uutils/coreutils for programs like basenc, base32, and base64, and I'm looking for ways to optimize them. I've experimented with an AVX2 implementation of simple Base32 encoding and the results were promising - to this end, I'm looking to write optimized SIMD implementations of encoding and decoding for individual format specifications. Is this something data-encoding can support, given its current API? Would such implementation (and possibly API) changes be welcome? Or should I write my own crate instead?
Thanks for reaching out!
This crate focuses (in this order) on: correctness, user experience, generality, performance. So SIMD implementations if they only target specific encodings would favor performance over generality, which I'd like to avoid. The way I see there would be a few options:
- Writing some
data-encoding-simdproc-macro (or it could be part ofdata-encoding-macrotoo) to create SIMD-optimized encodings given their specification. There's 2 sub-options: the generated type is the same regardless of the specification (which may have a performance cost) or is unique per specification. I'm not sure I would have time to look into this, so that would probably not be out quickly. But that would be my preferred option (in particular if I can also provide constant-time implementations at the same time). - Adding some
simdmodule indata-encodingwith specific implementation for specific but common specifications. This would be a good way to get something out while the first option above is being worked on. That's my preferred practical solution because I don't expect the first solution to see the light of day any time soon (unless you're willing to attempt it). - Having a specific base32 SIMD crate (similar to
vb64for base64, although that one is new and apparently not meant to be maintained) which could be referenced from data-encoding documentation for those looking for performance, assuming that crate would be maintained.
What do you think? What's the time frame you would like this to be out? And how much effort are you willing to invest?
Unfortunately, I think the goals I have in mind don't align with the goals for data-encoding.
-
I don't think a proc-macro can deliver the kind of efficiency that I'm hoping to achieve with a manual implementation. For example, the index-to-character translation for standard Base32 is far easier than a table lookup - just translate
0 .. 26toA .. Zand26 .. 32to2 .. 8. For something like AVX2, where there's no efficient wide byte permutation available, this is much easier to deal with. Sincedata-encodingdefines specifications using full tables, deriving good translation functions is not going to be easy. -
While I think that a
simdmodule with format-specific implementations could achieve the efficiency gains I want, I think it would end up outweighing the rest of the library in terms of complexity and maintenance cost. Besides, I'd like to experiment with a different API which is more amenable to my specific use-case with respect to streaming. -
I think I'm going to write my own crate which doesn't try to offer the same kind of flexibility that
data-encodingcan. I'll be able to only work on the formats I need, and provide gradual optimization where I can freely take advantage of SIMD features and more recent Rust features.
I love doing these kinds of performance optimizations, so I'm willing to sink a fair chunk of time into this. But given how specific my goals are, I think writing my own crate makes the most sense.
Thanks for creating and maintaining this crate!
I don't think a proc-macro can deliver the kind of efficiency that I'm hoping to achieve with a manual implementation.
I don't see any theoretical reason it couldn't, but I can see how it seems uncertain (and may need some work to actually get it to work). I'll look into it if I ever get time. I think it's interesting.
I think writing my own crate makes the most sense.
Sounds good! Please send me a link to your crate once it's out. I'm definitely going to take a look at it for both:
- learn about implementations using SIMD features
- check what API you come up with (I'll ultimately like to revamp my API a bit and am curious to know what are the current shortcomings)
Just an update here. I have a basic design that I think works, but that's too much work for me to finish in the medium term. Maybe I'll pick this back up next year. Keeping open and renaming the issue.