coreutils
coreutils copied to clipboard
wc: Use SIMD for performance
The wc would benefit from using SIMD.
See:
- https://github.com/expr-fi/fastlwc
- https://github.com/Freaky/cw
Both projects are MIT licensed and the latter is written in Rust. Perhaps one could copy-paste some of the code from there.
@ArniDagur did you contact the upstream authors? If we do merge one of these projects, it should be done in coordination with them!
@ArniDagur did you contact the upstream authors?
No.
If we do merge one of these projects, it should be done in coordination with them!
Sure. It's not a legal requirement given the license compatibility, but I agree it's good practice. However, I think we'd most probably just borrow some routines -- not entire programs.
Sure. It's not a legal requirement given the license compatibility, but I agree it's good practice. However, I think we'd most probably just borrow some routines -- not entire programs.
IMO, it will still be inappropriate, if you want to merge someone else's code in, it is best to contact them before doing it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hasn't been fixed yet
Seemed like a free win so I looked into it...
The line count is easily "simd-able", and we might already be doing something like that with the bytecount library.
Once that is out of the way, the runtime is dominated by the word counting.
Unfortunately, all the fast wc implementations use SIMD bit tricks that only work with ASCII text.
I read a proof of concept for SIMD with utf-8 parsing but it's very complicated and only gets a 2x speed up instead of a 10x speed up.