nimble_parsec parse_integers_using_lower

Currently integers are matched by pattern matching on number between 48 and 57 and then substracting 48 ascii code - that is integer 0 As 48 is 0x30 then we can just pattern match that the upper half of the byte is 3 and take the lower half - only checking if it's less than 10. I prepared a livebook that compares both ways and there is 4-5% speed difference.

Comparison: 
half byte        3.04 K
original         2.93 K - 1.04x slower +13.23 μs

https://gist.github.com/dkuku/9004bbf92cb3b81e8ed6fe211b9c7fc6 I'm not sure how to measure the compilation time but it also should be a bit faster due to generating less code.

Jul 13 '23 21:07 dkuku

This is very nice! Instead of adding a special combinator, could we perhaps have some heuristic? Then we could find more occasions to apply this automatically on top of ascii ranges?

Jul 13 '23 22:07 josevalim

@josevalim I've been thinking about it but I don't think it makes much sense - integer is a really special case where we encode 0x3y to just y. The only place where it can be useful would be converting a string to octal or binary number - it won't even help when converting from hex representation to hex word because alphabet encoding starts with 0x41 and 0x61 - so the lower boundary still needs to be checked x >0 and x < 7; x = x + 9. which is the same as x>40 and x < 47; x = x - 31 But we don't have combinators for any of these yet. Having another example that makes sense would make easier to design this. Unless you have some ideas where it can be used ?

Jul 14 '23 08:07 dkuku

The immediate heuristic I can think of is this:

If the distance between both numbers is < 16 and the rem(first, 16) == 0, then apply it
If the distance between both numbers is < 32 and the rem(first, 32) == 0, then apply it
If the distance between both numbers is < 64 and the rem(first, 64) == 0, then apply it
If the distance between both numbers is < 128 and the rem(first, 128) == 0, then apply it

It could be extended to 2, 4, and 8, but I am not sure if those will be worth it.

Jul 14 '23 10:07 josevalim

I did some testing and it has no point for strings. With integers the benefit is that we have the integer already converted 'for free' and don't have to subtract ?0. This are different return values from the function comparing to what was returned from ascii(?0..?9). With partial matching on strings there are also other problems. I'm matching on some of the bytes but I need to return everything that was given to the function so there are 2 variables needed: if lets say I want to match on the string "1" I need something like this: defp function full = <<3::4, part::4>> when part < 1, do: full or I have to reconstruct the actual variable defp function <<3::4, part::4>> when part < 1, do: <<3::4, part::4>> this makes more sense because binary patterns cannot be matched in parallel so I can't match on part of the byte and also on the full byte in one go. It is also significantly slower:

Comparison: 
original         3.49 K
half_byte        2.37 K - 1.47x slower +134.91 μs

Memory usage statistics:

Name         Memory usage
original        493.70 KB
half_byte       704.64 KB - 1.43x memory usage +210.94 KB

Jul 14 '23 21:07 dkuku

I see! Btw, which OTP version are you using for testing? 26 has more recent optimizations.

Jul 14 '23 22:07 josevalim

Yes I'm using 26. But I just tried it on 25 and there my implementation is 3 x slower, when I run the same livebook. So it has no point for anyone with older otp version, So it has no point for anyone with older otp version. Maybe this should wait until elixir requires otp26?

Jul 15 '23 06:07 dkuku

Yeah, given the low gain, we should probably hold this one for a while (and also confirm this will hold in future Erlang/OTP versions). :) Thank you!

Jul 15 '23 06:07 josevalim

nimble_parsec
nimble_parsec copied to clipboard

parse_integers_using_lower_nibble

nimble_parsec nimble_parsec copied to clipboard

parse_integers_using_lower_nibble

nimble_parsec
nimble_parsec copied to clipboard