otp
otp copied to clipboard
Further optimize binary matching
This pull request builds on the improvements in #6259 to further optimize binary matching. See the individual commit messages for more details.
CT Test Results
4 files 406 suites 52m 3s :stopwatch: 2 076 tests 2 027 :heavy_check_mark: 49 :zzz: 0 :x: 6 361 runs 6 294 :heavy_check_mark: 67 :zzz: 0 :x:
Results for commit 049c73a2.
:recycle: This comment has been updated with latest results.
To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.
See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.
Artifacts
- Complete CT logs (Download Logs)
- HTML Documentation (Download HTML Docs)
- No Windows Installer found
// Erlang/OTP Github Action Bot
Would this optimise something like the following (which is how I'd decode varints or Unsigned LEB128 - common in many compact networking protocols):
varint(<<0:1, Len:7, Rest/bits>>) ->
continue(Rest, Len);
varint(<<1:1, Len1:7, 0:1, Len2:7, Rest/bits>>) ->
Len = (Len2 bsl 7) bor Len1,
continue(Rest, Len);
...
@michalmuskala Yes, this code will be nicely optimized by this pull request and #6259.
Internally in the compiler, your example would be rewritten to something similar like this:
varint(Bin) ->
case Bin of
<<X:1, Tail/bits>> ->
case X of
0 ->
case Tail of
<<Len:7, Rest/bits>> ->
continue(Rest, Len)
end;
1 ->
case Tail of
<<Len1:7, 0:1, Len2:7, Rest/bits>> ->
Len = (Len2 bsl 7) bor Len1,
continue(Rest, Len)
end
end
end.
#6259 makes matching of segments having "odd" sizes such as 1
and 7
more efficient than in OTP 25.
This pull request will make the matching of <Len1:7, 0:1, Len2:7, Rest/bits>>
more efficient. The entire match will be contained in one bs_match
instruction:
{bs_match,{f,4},
{x,1},
{commands,[{ensure_at_least,15,1},
{integer,2,{literal,[]},7,1,{x,2}},
{'=:=',3,1,0},
{integer,3,{literal,[]},7,1,{x,0}},
{get_tail,3,1,{x,1}}]}}.
Before this pull request, there would be two bs_match
instructions with a bs_match_string
instruction in between:
{bs_match,{f,4},
{x,1},
{commands,[{ensure_at_least,7,1},
{integer,2,{literal,[]},7,1,{x,2}}]}}.
{test,bs_match_string,{f,4},[{x,1},1,{string,<<0>>}]}.
{bs_match,{f,4},
{x,1},
{commands,[{ensure_at_least,7,1},
{integer,3,{literal,[]},7,1,{x,0}},
{get_tail,3,1,{x,1}}]}}.
This looks great! Thank you for explaining.
It's a shame the two sub-byte reads are separated - I assume always reading full bytes is more efficient. In this case in particular it would be better to read <<Tag:1, Len:7, ...>>
since we'll need the Len
in both branches either way.
Is this something it would make sense to optimise in the compiler? Or perhaps structuring the code differently could help with that
It's a shame the two sub-byte reads are separated - I assume always reading full bytes is more efficient. In this case in particular it would be better to read
<<Tag:1, Len:7, ...>>
since we'll need theLen
in both branches either way.
Well, doing one read instead of two reads is faster. (Only one read instead of two would be needed for extracting a "short" length.) Since #6259, there is less difference between reading whole bytes and parts of bytes or doing unaligned reads.
It is in fact possible to restructure the source code so we avoid separating the first two sub-byte reads:
varint(Bin) ->
case Bin of
<<X:1, Len1:7, Tail/bits>> ->
case X of
0 ->
continue(Tail, Len1);
1 ->
case Tail of
<<0:1, Len2:7, Rest/bits>> ->
Len = (Len2 bsl 7) bor Len1,
continue(Rest, Len)
end
end
end.
I suppose that it would be possible for the compiler to do that kind of transformation by itself. I don't know how tricky that would be or how much it would give.