jetscii ByteSubstring: add an option to used an owned needle (not borrowed)

This PR is a suggestion. I have done a minimal implementation, just so we have something to discuss.

The problem I was facing was that I couldn't use a borrowed ByteSubstring. So, I've modified jetscii's code to accept a Cow object.

Let me know what you think and if there're are any more modifications you would like to see as part of this PR.

Sep 29 '21 09:09 tsnoam

@shepmaster kind ping on that.

i would be happy to rebase on current main but I would like to know if you'd be interested in this change

Jun 28 '22 12:06 tsnoam

Oh, wow, I'm sorry that this totally dropped off my radar!

I think the idea has merit, but I think I would suggest a different implementation.

Right now, we have crate::ByteSubstring which delegates to crate::fallback::ByteSubstring and crate::simd::ByteSubstring. In order to support your goal, I'd suggest:

refactoring commit
- Stop storing the needle in {fallback,simd}::ByteSubstring, instead passing it into find.
- Start storing the needle in crate::ByteSubstring instead.
feature commit
- Add a new crate::OwnedByteSubstring that holds a Vec<u8>
- This calls into {fallback,simd}::ByteSubstring::find.

This way, you can choose if you need ownership or just a reference.

Does that make sense and seem plausible?

Jul 06 '22 17:07 shepmaster

After musing on it for a while I wonder if we could change ByteSubstring to be generic with T: AsRef<[u8]>.

Jul 07 '22 00:07 shepmaster

Hi @shepmaster Sorry for the late response. Yes, I like your suggestion. I can contribute these changes and obviously base it on the latest "main" branch.

Jul 14 '22 05:07 tsnoam

Sorry for the late response

i think you’ve got a few months longer before you need to start apologizing for anything like that 😅

Jul 14 '22 13:07 shepmaster

@shepmaster I've updated the PR in accordance with our discussion:

ByteSubstring takes T: AsRef<[u8]
I also added what I thought was an appropriate way to do the same thing for Substring, without breaking ergonomics or backward compat

let me know what you think.

Aug 25 '22 14:08 tsnoam

I just realized that I only compiled it on my macos m1. So only the fallback code was tested 🤦‍♂️

I will fix and update here...

Aug 28 '22 06:08 tsnoam

Ok. Fixed.

@shepmaster would really love to see your comments.

Aug 28 '22 09:08 tsnoam

@shepmaster kind ping on that

Sep 05 '22 09:09 tsnoam

@shepmaster kind ping :)

Sep 12 '22 06:09 tsnoam

I did one force-push to tidy up some formatting things, and a second force-push to rebase on the current main branch (which has some soundness fixes).

Overall, things seem reasonable, but I'm a bit hesitant about the dispatch macro vs enum solution. The new code has to check the enum discriminant each call (e.g. to find). The old code would use the feature test, which I believe would do some trickery to perform the test just once and then effectively set a global value that could be checked in the future. I'll need to do some performance tests.

Have you tried this code locally with a String? Does it work for your original case?

Sep 13 '22 17:09 shepmaster

Hi @shepmaster

Thanks for the review.

ACK for the formatting and rebasing over master.
Basically, I wanted to hear your initial feedback about the code before working harder to make things perfect ;)
However, I did not realize that unitests should be executed with all features turned on (I did that and fixed the Substring code).
I did notice that Substring might have backward-incompatible type definition. I'm not 100% sure that this is the way to go, but I think it is quite clean and obvious, will be glad to know what you think of it.
Regarding the dispatch macro: there is no indication in rust documentation that this is cached in any way ( https://doc.rust-lang.org/std/macro.is_x86_feature_detected.html ), so I decided to check it out - first by benchmarks (I couldn't find any significant difference in either direction) but then I disassembled the compiled executable (using objdump -d). The new approach was shorter in about 50% for the ByteSubstring::find code.

Bottom line, I fixed the code a little more (now for all features), benchmarks are running properly - and obviously the new code was forced pushed into the branch.

Sep 14 '22 12:09 tsnoam

I don't understand exactly how, but the benchmarks all show a marked decrease in speed with this branch. That's even including the benchmarks for the standard library versions, which obviously shouldn't change!

 name                                         before ns/iter         after ns/iter          diff ns/iter  diff %  speedup 
 bench::space_ascii_chars                     356,001 (14727 MB/s)   362,121 (14478 MB/s)          6,120   1.72%   x 0.98 
 bench::space_stdlib_find_char                279,225 (18776 MB/s)   278,723 (18810 MB/s)           -502  -0.18%   x 1.00 
 bench::space_stdlib_find_char_set            2,608,215 (2010 MB/s)  2,626,160 (1996 MB/s)        17,945   0.69%   x 0.99 
 bench::space_stdlib_find_closure             2,318,368 (2261 MB/s)  2,390,488 (2193 MB/s)        72,120   3.11%   x 0.97 
 bench::space_stdlib_find_string              2,153,673 (2434 MB/s)  2,268,436 (2311 MB/s)       114,763   5.33%   x 0.95 
 bench::space_stdlib_iterator_position        1,160,877 (4516 MB/s)  1,198,881 (4373 MB/s)        38,004   3.27%   x 0.97 
 bench::substring_stdlib_find                 463,832 (11303 MB/s)   469,920 (11156 MB/s)          6,088   1.31%   x 0.99 
 bench::substring_with_created_searcher       358,074 (14641 MB/s)   361,645 (14497 MB/s)          3,571   1.00%   x 0.99 
 bench::xml_delim_3_ascii_chars               361,004 (14523 MB/s)   362,989 (14443 MB/s)          1,985   0.55%   x 0.99 
 bench::xml_delim_3_stdlib_find_char_closure  3,516,929 (1490 MB/s)  3,459,551 (1515 MB/s)       -57,378  -1.63%   x 1.02 
 bench::xml_delim_3_stdlib_find_char_set      2,612,870 (2006 MB/s)  3,456,608 (1516 MB/s)       843,738  32.29%   x 0.76 
 bench::xml_delim_3_stdlib_iterator_position  1,131,135 (4635 MB/s)  1,155,312 (4538 MB/s)        24,177   2.14%   x 0.98 
 bench::xml_delim_5_ascii_chars               360,298 (14551 MB/s)   360,735 (14533 MB/s)            437   0.12%   x 1.00 
 bench::xml_delim_5_stdlib_find_char_closure  2,715,305 (1930 MB/s)  3,458,322 (1516 MB/s)       743,017  27.36%   x 0.79 
 bench::xml_delim_5_stdlib_find_char_set      3,449,795 (1519 MB/s)  3,473,528 (1509 MB/s)        23,733   0.69%   x 0.99 
 bench::xml_delim_5_stdlib_iterator_position  1,138,696 (4604 MB/s)  1,158,930 (4523 MB/s)        20,234   1.78%   x 0.98

Sep 14 '22 14:09 shepmaster

might have backward-incompatible type definition

Yeah, I think that this change will warrant a new semver-incompatible version to be released; that's not a big deal IMO.

there is no indication in rust documentation that this is cached in any way

Yeah, I read through that before posting my comment (that's what led to the "I believe"). It must have been made up in my head.

Sep 14 '22 14:09 shepmaster

I don't understand exactly how, but the benchmarks all show a marked decrease in speed with this branch. That's even including the benchmarks for the standard library versions, which obviously shouldn't change!

That is to be expected. Benchmarks will usually show differences of +-3% depending on their nature and whatever the benchmarking server is doing at the moment.

What I usually did when I was running benchmarks using nightly-rust was to externally run them again and again and look at the avg/max/min of each individual benchmark.

If you would like, it is possible to migrate the benchmarks to use criterion, which saves you the hassle of what described above. There is also a nice tool called critcmp which helps in comparing the results of two different criterion executions.

Sep 14 '22 15:09 tsnoam

oh, and criterion also runs with stable rust, which is a bonus...

Sep 14 '22 15:09 tsnoam

sorry, i'm looking again at the stdlib difference, and it is weird. too high of a diff.

let me look into it a little more.

Sep 14 '22 15:09 tsnoam

jetscii jetscii copied to clipboard

ByteSubstring: add an option to used an owned needle (not borrowed)

jetscii
jetscii copied to clipboard