jetscii
jetscii copied to clipboard
ByteSubstring: add an option to used an owned needle (not borrowed)
This PR is a suggestion. I have done a minimal implementation, just so we have something to discuss.
The problem I was facing was that I couldn't use a borrowed ByteSubstring
. So, I've modified jetscii's code to accept a Cow
object.
Let me know what you think and if there're are any more modifications you would like to see as part of this PR.
@shepmaster kind ping on that.
i would be happy to rebase on current main
but I would like to know if you'd be interested in this change
Oh, wow, I'm sorry that this totally dropped off my radar!
I think the idea has merit, but I think I would suggest a different implementation.
Right now, we have crate::ByteSubstring
which delegates to crate::fallback::ByteSubstring
and crate::simd::ByteSubstring
. In order to support your goal, I'd suggest:
- refactoring commit
- Stop storing the needle in
{fallback,simd}::ByteSubstring
, instead passing it intofind
. - Start storing the needle in
crate::ByteSubstring
instead.
- Stop storing the needle in
- feature commit
- Add a new
crate::OwnedByteSubstring
that holds aVec<u8>
- This calls into
{fallback,simd}::ByteSubstring::find
.
- Add a new
This way, you can choose if you need ownership or just a reference.
Does that make sense and seem plausible?
After musing on it for a while I wonder if we could change ByteSubstring
to be generic with T: AsRef<[u8]>
.
Hi @shepmaster Sorry for the late response. Yes, I like your suggestion. I can contribute these changes and obviously base it on the latest "main" branch.
Sorry for the late response
i think you’ve got a few months longer before you need to start apologizing for anything like that 😅
@shepmaster I've updated the PR in accordance with our discussion:
-
ByteSubstring
takesT: AsRef<[u8]
- I also added what I thought was an appropriate way to do the same thing for
Substring
, without breaking ergonomics or backward compat
let me know what you think.
I just realized that I only compiled it on my macos m1. So only the fallback code was tested 🤦♂️
I will fix and update here...
Ok. Fixed.
@shepmaster would really love to see your comments.
@shepmaster kind ping on that
@shepmaster kind ping :)
I did one force-push to tidy up some formatting things, and a second force-push to rebase on the current main branch (which has some soundness fixes).
Overall, things seem reasonable, but I'm a bit hesitant about the dispatch
macro vs enum solution. The new code has to check the enum discriminant each call (e.g. to find
). The old code would use the feature test, which I believe would do some trickery to perform the test just once and then effectively set a global value that could be checked in the future. I'll need to do some performance tests.
Have you tried this code locally with a String
? Does it work for your original case?
Hi @shepmaster
Thanks for the review.
- ACK for the formatting and rebasing over master.
- Basically, I wanted to hear your initial feedback about the code before working harder to make things perfect ;)
- However, I did not realize that unitests should be executed with all features turned on (I did that and fixed the Substring code).
- I did notice that
Substring
might have backward-incompatible type definition. I'm not 100% sure that this is the way to go, but I think it is quite clean and obvious, will be glad to know what you think of it. - Regarding the dispatch macro: there is no indication in rust documentation that this is cached in any way ( https://doc.rust-lang.org/std/macro.is_x86_feature_detected.html ), so I decided to check it out - first by benchmarks (I couldn't find any significant difference in either direction) but then I disassembled the compiled executable (using
objdump -d
). The new approach was shorter in about 50% for theByteSubstring::find
code.
Bottom line, I fixed the code a little more (now for all features), benchmarks are running properly - and obviously the new code was forced pushed into the branch.
I don't understand exactly how, but the benchmarks all show a marked decrease in speed with this branch. That's even including the benchmarks for the standard library versions, which obviously shouldn't change!
name before ns/iter after ns/iter diff ns/iter diff % speedup
bench::space_ascii_chars 356,001 (14727 MB/s) 362,121 (14478 MB/s) 6,120 1.72% x 0.98
bench::space_stdlib_find_char 279,225 (18776 MB/s) 278,723 (18810 MB/s) -502 -0.18% x 1.00
bench::space_stdlib_find_char_set 2,608,215 (2010 MB/s) 2,626,160 (1996 MB/s) 17,945 0.69% x 0.99
bench::space_stdlib_find_closure 2,318,368 (2261 MB/s) 2,390,488 (2193 MB/s) 72,120 3.11% x 0.97
bench::space_stdlib_find_string 2,153,673 (2434 MB/s) 2,268,436 (2311 MB/s) 114,763 5.33% x 0.95
bench::space_stdlib_iterator_position 1,160,877 (4516 MB/s) 1,198,881 (4373 MB/s) 38,004 3.27% x 0.97
bench::substring_stdlib_find 463,832 (11303 MB/s) 469,920 (11156 MB/s) 6,088 1.31% x 0.99
bench::substring_with_created_searcher 358,074 (14641 MB/s) 361,645 (14497 MB/s) 3,571 1.00% x 0.99
bench::xml_delim_3_ascii_chars 361,004 (14523 MB/s) 362,989 (14443 MB/s) 1,985 0.55% x 0.99
bench::xml_delim_3_stdlib_find_char_closure 3,516,929 (1490 MB/s) 3,459,551 (1515 MB/s) -57,378 -1.63% x 1.02
bench::xml_delim_3_stdlib_find_char_set 2,612,870 (2006 MB/s) 3,456,608 (1516 MB/s) 843,738 32.29% x 0.76
bench::xml_delim_3_stdlib_iterator_position 1,131,135 (4635 MB/s) 1,155,312 (4538 MB/s) 24,177 2.14% x 0.98
bench::xml_delim_5_ascii_chars 360,298 (14551 MB/s) 360,735 (14533 MB/s) 437 0.12% x 1.00
bench::xml_delim_5_stdlib_find_char_closure 2,715,305 (1930 MB/s) 3,458,322 (1516 MB/s) 743,017 27.36% x 0.79
bench::xml_delim_5_stdlib_find_char_set 3,449,795 (1519 MB/s) 3,473,528 (1509 MB/s) 23,733 0.69% x 0.99
bench::xml_delim_5_stdlib_iterator_position 1,138,696 (4604 MB/s) 1,158,930 (4523 MB/s) 20,234 1.78% x 0.98
might have backward-incompatible type definition
Yeah, I think that this change will warrant a new semver-incompatible version to be released; that's not a big deal IMO.
there is no indication in rust documentation that this is cached in any way
Yeah, I read through that before posting my comment (that's what led to the "I believe"). It must have been made up in my head.
I don't understand exactly how, but the benchmarks all show a marked decrease in speed with this branch. That's even including the benchmarks for the standard library versions, which obviously shouldn't change!
That is to be expected. Benchmarks will usually show differences of +-3% depending on their nature and whatever the benchmarking server is doing at the moment.
What I usually did when I was running benchmarks using nightly-rust was to externally run them again and again and look at the avg/max/min of each individual benchmark.
If you would like, it is possible to migrate the benchmarks to use criterion, which saves you the hassle of what described above. There is also a nice tool called critcmp which helps in comparing the results of two different criterion executions.
oh, and criterion also runs with stable rust, which is a bonus...
sorry, i'm looking again at the stdlib difference, and it is weird. too high of a diff.
let me look into it a little more.