wren icon indicating copy to clipboard operation
wren copied to clipboard

Specialize `Range.contains(_)`

Open ConfusedSky opened this issue 1 year ago • 13 comments

Creates a specialization for Range.contains(_) that is a bound check rather than using Sequence's implementation which uses iteration instead.

This came up when I was implementing some Project Euler stuff when and I noticed the pathological behavior in some of those problems as well. Particularly problem 17 was about 1000x slower using ranges (I had like 6 different ranges).

I didn't add any new tests since this method is pretty extensively covered. I did create a pathological benchmark for it though.

Benchmark included below uses ./bin/wren_test. I wasn't sure if I should include it in the benchmarks folder since it looks like that is for comparing against different languages.

Looks like it's roughly 300x faster in this pathological case, but I reran it for (0...10) and it still is about 5x faster.

Anyway, let me know if I should change anything here! This isn't too big of a change and I think it can help all users of the language.

var start = System.clock
var total = 0
for (i in 0..1e5) {
    if ((0...1e3).contains(i)) {
        total = total + 1
    }
}
System.print(total)
System.print("elapsed: %(System.clock - start)")

// Before
// 1000
// elapsed: 2.968864

// After
// 1000
// elapsed: 0.010275

ConfusedSky avatar Jul 31 '22 05:07 ConfusedSky

Looks like some great performance improvement here! Thanks a lot.

RobLoach avatar Oct 04 '22 22:10 RobLoach

This code is broken as it only checks against boundaries and thus does not produce the same result as the reference implementation. Though, if the correct behavior is implemented in C it would be slower than currently, but still much much faster than the wren counterpart.

mhermier avatar Oct 04 '22 23:10 mhermier

@mhermier What do you mean? It's passing all the tests. So as far as the tests are concerned the behavior is the same.

ConfusedSky avatar Oct 05 '22 00:10 ConfusedSky

@ConfusedSky I mean that the tests are not sufficient. Range is an iterable, not the continuous mathematical object that represents all the numbers within 2 numbers that are the limits. As a general rule, always consider what you are writing and testing against. If a test does pass or not always look at the tests you are referring to. They are human products, and can be broken/insufficient/whatever...

mhermier avatar Oct 05 '22 09:10 mhermier

@mhermier I guess I hadn't considered that, however in that case really the only thing that would need to be checked is if the number was an integer.

Although I do think continuous range checking is valuable. Potentially this could be split into two methods, where something like isInRange (probably want a better name where) could be as written, and contains would require the number to be an integer. But I agree now, the behavior is inconsistent.

ConfusedSky avatar Oct 07 '22 20:10 ConfusedSky

@ConfusedSky well situation is more complicated than that. Since there is no requirement for Range bounds to be integers, so one has to iterate the range, in case the accumulation brings accumulation errors...

I also guess there is a potential need for isInRange functionality, but it can be implemented externally. So unless there is a feasibility issue I didn't see, or a use case where it is proven to be a real performance issue, there is no need for inclusion in core.

mhermier avatar Oct 07 '22 20:10 mhermier

@mhermier Ah yea I see. But it does appear to always increment by one. I'll revisit this. image

ConfusedSky avatar Oct 07 '22 20:10 ConfusedSky

See/use calculateRange it handles all the corner cases, and in case we add customized step value support, it would be the place to handle the situation.

mhermier avatar Oct 07 '22 20:10 mhermier

It appears that the implementation in main doesn't seem to always work either image Floating point precision error?

That being said also, the specialization for contains for string, isn't consistent with the one for sequence.

ConfusedSky avatar Oct 07 '22 21:10 ConfusedSky

Saw it, it is reproduceable with (2.2-1)==1.2 returning false.

mhermier avatar Oct 07 '22 21:10 mhermier

For fun this statement can be verified in c++ using:

static_assert((2.2-1)==1.2);

which produce in compiler explorer for example:

<source>:1:22: error: static assertion failed
    1 | static_assert((2.2-1)==1.2);
      |               ~~~~~~~^~~~~
<source>:1:22: note: the comparison reduces to '(1.2000000000000002e+0 == 1.2e+0)'

mhermier avatar Oct 07 '22 22:10 mhermier

I tried node and python as well, same deal. Since presumably, it has more to do with IEE-754 than the behavior of wren.

As written, contains doesn't seem to be very useful for ranges that have from values that aren't integers.

ConfusedSky avatar Oct 07 '22 22:10 ConfusedSky

About String, it is consistent, as it produce the expected behavior with parameters of a specific shape. So if you check that the searched string is a one character String (even UTF-8 ones), it produce the normal behavior of contains(). The fact that it also search do substring is an interesting side effect that augment the behavior. But more importantly the user can act on the parameter to ban the unwanted effect.

On the other hand with the original patch, one has to act on the Range to see if the behavior was correct.

The nuance is subtle, I agree. It is a question of "does it work as expected with any valid argument" (as one could trivially guess from the API naming convention), and the original Range patch was not.

That said, the lack of a proper Char class make String a little bit to much complex to apprehend, since it is a ball of mud mixing string, character and byte array data. But that another story, and huge can of worms...

mhermier avatar Oct 07 '22 22:10 mhermier