phobos icon indicating copy to clipboard operation
phobos copied to clipboard

Added std.range.orElse which provides a default value for empty ranges

Open wilzbach opened this issue 8 years ago • 26 comments

As mentioned on #5153 if an API method could potentially return an "undefined response (aka null), there are a couple of strategies:

Method Uses in Phobos
return Nullable no precedent in Phobos (?)
use enforce (aka exceptions) generally disliked, because exceptions allocate (and everybody wants to be @nogc nowadays
add an overload with a seed std.algorithm.iteration.maxElement

As the third way seems to be the current strategy (e.g. fold, reduce, ...), it creates a lot of "redundant" code. Here I propose to add a seed(<range>, <defaultValue>) method that can be combined with any range method to solve this problem. The gist is very simple:

import std.range : iota;
assert(0.iota(10).seed(42).front == 0);
assert(0.iota.seed(42).front == 42);

The simplest way to add such a orElse method would be to combine choose and only like this:

auto orElse(Range, RangeElementType = ElementType!Range)(Range r, RangeElementType fallback)
if (isInputRange!(Unqual!(Range)) &&
    !is(CommonType!(ElementType!Range, RangeElementType) == void))
{
    import std.range : choose, only;
    return choose(!r.empty, r, only(seed));
}

However this has a couple of problems:

  • no custom type - e.g. static if (!hasDefaultValue!Range) enforce(!r.empty)
  • choose doesn't work in CTFE (#14660)
  • not fully @safe (it uses union and void[])
  • doesn't work with ranges of const elements
  • hasn't all opSlice overloads (e.g. InfiniteRanges)
  • no toString for user convenience as that's generic
  • would require a custom only that allows assigns, otherwise hasAssignableElements and hasSwappableElements won't work.

While most of these points are solvable with e.g. template mixins, I am not sure about all of them and hence this PR contains its an implementation of a OrElseRange.

Destroy this idea!

edit: renamed from seed to orElse

wilzbach avatar Feb 19 '17 20:02 wilzbach

Please note that this comment is to continue a discussion here.

You're confusing a seed with a fallback. Seed is a term used when referring reduction functions, e.g. max, sum, fold, reduce. The seed is the initial value of an accumulator. Technically, you can seed a reduction function by unconditionally prepending to the sequence of inputs - and this is important - but not by only prepending when the input is empty, as this implementation does.

Furthermore, though technically walkBack could be expressed as a reduction HOF, and this seed function would technically achieve your goal if the prepending was unconditional, and prepending to the input isn't a totally unreasonable approach to seeding a reduction function, it's important to understand that it's not an apt way to perceive what's going on.

A fallback is a default value returned when an operation received an input that would otherwise cause it to fail, as opposed to throwing an exception or (e.g. when the assert in walkBack is omitted) silently resulting in undefined behavior. Probably the best analogy is the associativearray.get(key, fallback) method. Of course, the fallback is returned when no such key exists in the array. The fallback argument is not a seed in this case, and could not possibly be implemented as one.

My mach library implements a function similar to walkBack, accepting a fallback argument. It also defines a walkindex function, similar in purpose to walkLength and walkBack, which takes a range and an index. As you'd expect, it traverses the range until it hits that index, and it returns the element there. It optionally accepts a fallback. When the range is shorter than the given index, the fallback is returned rather than a fatal error being produced. This fallback also is not a seed, and could not possibly be implemented as one.

However - this isn't necessarily an argument against including a well-implemented function serving this purpose. (Though I don't think seed is an appropriate name for what it does) Rather, this is specifically an argument against using a function such as this one rather than simply having an optional fallback argument for walkBack.

pineapplemachine avatar Feb 19 '17 22:02 pineapplemachine

The gist is very simple:

import std.range : iota; assert(0.iota(10).seed(42).front == 0); assert(0.iota.seed(42).front == 42);

This cries to me to call it "fallback" or "orElse" because that is what it does. "seed" doesn't tell me anything - all I can think of is random number generators.

DmitryOlshansky avatar Feb 20 '17 14:02 DmitryOlshansky

This cries to me to call it "fallback" or "orElse" because that is what it does.

I really like the name orElse! (renamed and moved to std.range)

wilzbach avatar Feb 21 '17 03:02 wilzbach

I really like the name orElse!

Shamelessly stolen from Scala Option ;) Now imagine if we make Nullable follow range interface this all can tick together very nicely.

DmitryOlshansky avatar Feb 21 '17 16:02 DmitryOlshansky

Phobos-idiomatic D code is going to be a language for the initiated. We need to think about rite of Passage

9il avatar Feb 27 '17 05:02 9il

Phobos-idiomatic D code is going to be a language for the initiated. We need to think about rite of Passage

Something like that: a candidate should repent about all accidents with Go. Then he should bring bones of three Scala engineers and read Stroustrup: The C++ Programming Language all night without rest. Then we will exam this persons about Phobos internals

9il avatar Feb 27 '17 05:02 9il

One more thing: orElse should also accept a full range that is returned if the first range is empty.

andralex avatar Feb 28 '17 19:02 andralex

From the NG:

Yep. Additions to std.range like orElse will make idiomatic Phobos code slower then C++ and Scala. It is not clear when you do a benchmark for single implementation, but idiomatic combinations of D Ranges will became slow.

To bring this a bit into context: mir-algorithm uses C++-like iterators to avoid this - an example:

IotaIterator!int iota;
RetroIterator!(IotaIterator!int) retro;

++iota;
--retro;
assert(*retro == *iota);

http://docs.algorithm.dlang.io/latest/mir_ndslice_iterator.html

wilzbach avatar Mar 05 '17 06:03 wilzbach

To bring this a bit into context: mir-algorithm uses C++-like iterators to avoid this - an example:

  1. The main orElse performance problem is that D does not have virtual function calls for structs like C++ does. With virtual cals we can avoid if conditions.

  2. chain performance problem is if condition too. It can be solved by introducing a generic analog of opApply. It will affect basic Dlang iteration idioms like chain(...).until, chain(...).array and chain(...).filter.

  3. bitwise problem is that it is implementation for random access range and input range APIs is the same. 2 separate implementations would be faster, few times smaller and clearer (plus less bugs). Recently bitpack (for few bits integers) was added into mir-algorithm. It was very easy to add and to test it. Both bitpack and bitwise require only two methods to implement: opIndex and opIndexAssign. That is all, 31 LOC and 42 LOC respectively. A type with opIndex is called field, on top of a field a random access iterator (in terms of C++) can be created using FieldIterator wrapper. On top of any random access iterator (including pointers) can be created a Slice type that implements all random access range primitives including multidimensional features. This approach simplifies multiple times random access range implementations. For already implemented features please see ndslice scheme. Please note that input range API for bitwise is buggy because random number generators should have special this(this) implementation. As input range bitwise alternative for RNGs Bernoulli2Variable can be used.

9il avatar Mar 05 '17 07:03 9il

  1. The main orElse performance problem is that D does not have virtual function calls for structs like C++ does. With virtual cals we can avoid if conditions.

I find this claim surprising. Replacing a virtual call having 2-7 possible landing sites with a switch or cascade of if/else statements is a well known and venerable optimization, which to my knowledge has been first documented by a few seminal papers in the 90s, notably:

  • "Profile-Guided Receiver Class Prediction" by Grove et al (http://www.research.ibm.com/people/d/dgrove/papers/oopsla95.pdf)
  • "Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis" by Dean et al (http://web.cs.ucla.edu/~palsberg/tba/papers/dean-grove-chambers-ecoop95.pdf)

The optimization called "guarded devirtualization" or "type casing" is commonly taught in graduate classes in compiler design and replaces essentially code that you claim is fast, with code that you claim is slow. The win comes from replacing indirect calls (not inlinable and difficult to speculate) with branches and regular calls (easy to inline and speculate).

Of course, this technique, although considered canon, is of venerable age and may have become stale due to recent advances in CPU technology. However the entire HHVM team has used it in recent times systematically in their JIT PHP compiler implemented in C++. Replacing virtual calls with typecased switch statements has always been fruitful, and can be seen everywhere in the code. (I also found an easy to digest recent independent benchmark: http://dimitri-christodoulou.blogspot.com/2011/09/case-for-replacing-polymorphism-with.html.)

For chain and orElse in particular, we're looking at a branch where speculation may fail only once or (respectively) never. Replacing these with indirect calls and obtaining a performance improvement would be a very remarkable result. I'd be in your debt if you could produce such an experiment.

Regarding bitwise, any improvements would be welcome! I recall you submitted some and/or promised more. Please advise. Thanks!

andralex avatar Mar 05 '17 12:03 andralex

The main orElse performance problem is that D does not have virtual function calls for structs like C++ does. With virtual cals we can avoid if conditions. I find this claim surprising. Replacing a virtual call having 2-7 possible landing sites with a switch or cascade of if/else statements is a well known and venerable optimization, which to my knowledge has been first documented by a few seminal papers in the 90s, notably:

"Profile-Guided Receiver Class Prediction" by Grove et al (http://www.research.ibm.com/people/d/dgrove/papers/oopsla95.pdf) "Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis" by Dean et al (http://web.cs.ucla.edu/~palsberg/tba/papers/dean-grove-chambers-ecoop95.pdf) The optimization called "guarded devirtualization" or "type casing" is commonly taught in graduate classes in compiler design and replaces essentially code that you claim is fast, with code that you claim is slow. The win comes from replacing indirect calls (not inlinable and difficult to speculate) with branches and regular calls (easy to inline and speculate).

Of course, this technique, although considered canon, is of venerable age and may have become stale due to recent advances in CPU technology. However the entire HHVM team has used it in recent times systematically in their JIT PHP compiler implemented in C++. Replacing virtual calls with typecased switch statements has always been fruitful, and can be seen everywhere in the code. (I also found an easy to digest recent independent benchmark: http://dimitri-christodoulou.blogspot.com/2011/09/case-for-replacing-polymorphism-with.html.)

For simple benchmarks orElse maybe faster then vtable. But it ads if conditions while vtable just replace pointers. For example complex pipeline based on chain, choose, chooseAmong and orElse will became slower after each pipeline addition. In other hand vtables plus special iteration patter for chain allow to generate fast and tiny code for any kind of function pipeline.

The advantage of devirtualisation is that it is done by optimiser after virtual tables were constructed.

For chain and orElse in particular, we're looking at a branch where speculation may fail only once or (respectively) never. Replacing these with indirect calls and obtaining a performance improvement would be a very remarkable result. I'd be in your debt if you could produce such an experiment.

stack function was added to mir-algorithm today. It is multidimensional chain analog for ndslice.

import mir.ndslice.allocation: slice;
import mir.ndslice.topology: iota;

// 0, 1, 2
// 3, 4, 5
auto a = iota(2, 3);
// 0, 1
// 2, 3
auto b = iota(2, 2);
// 0, 1, 2, 3, 4
auto c = iota(1, 5);

// 0, 1, 2, | 0, 1
// 3, 4, 5, | 2, 3
// ---------------
// 0, 1, 2,   3, 4
// construction phase
auto s = stack(stack!1(a, b), c);

// allocation phase
auto d = s.slice;
assert(d == [
    [0, 1, 2, 0, 1],
    [3, 4, 5, 2, 3],
    [0, 1, 2, 3, 4],
    ]);

To work with stack special until function was added. Its idea is similar to opApply concept. If a predicate always returns false then until just iterates all elements in a stack. stack + until does not require additional conditions to iterate all elements comparing with chain + Range API.

The simplest performance optimisations that can be added to Phobos are array, copy and put specialisations for chain.

Regarding bitwise, any improvements would be welcome! I recall you submitted some and/or promised more. Please advise. Thanks!

I think that we should move forward with single random access range declaration (ndslice) and have separate implementation for random access ranges and non-random access ranges. I propose to freeze Phobos additions and add only improvements and bugfixes. Other my thoughts about Phobos you already know :-)

9il avatar Mar 06 '17 12:03 9il

For simple benchmarks orElse maybe faster then vtable. But it ads if conditions while vtable just replace pointers. For example complex pipeline based on chain, choose, chooseAmong and orElse will became slower after each pipeline addition. In other hand vtables plus special iteration patter for chain allow to generate fast and tiny code for any kind of function pipeline.

That doesn't sound right. Composition with virtual functions (Java-style) or indirect calls (Haskell style) adds one extra indirect call per composition, which are notoriously difficult to optimize.

Do you have any evidence to support the claim that you can compose freely with virtuals and have them all gone by the time the code is run?

To work with stack special until function was added. Its idea is similar to opApply concept. If a predicate always returns false then until just iterates all elements in a stack. stack + until does not require additional conditions to iterate all elements comparing with chain + Range API.

But that's a different design than what you discussed previously.

The simplest performance optimisations that can be added to Phobos are array, copy and put specialisations for chain.

Yah, range methods that specialize certain generic functions would be good to add. I recall you did have a pull request to that effect, what came of it?

andralex avatar Mar 06 '17 13:03 andralex

That doesn't sound right. Composition with virtual functions (Java-style) or indirect calls (Haskell style) adds one extra indirect call per composition, which are notoriously difficult to optimize.

I mean D interface style. choose will just return one of its arguments for example, so it will add nothing.

Do you have any evidence to support the claim that you can compose freely with virtuals and have them all gone by the time the code is run?

Nope, we can not remove all virtual calls without advanced JIT, but we can reduce pipeline of virtual calls to single one in expression likechoose(cond1, a, b).orElse(c).chooseAmong(cond2, d, e, f). The result will be just one of a, b, c, d, e, or f. But all ranges should be interfaces/classes (in D).

I recall you did have a pull request to that effect, what came of it?

A PR was for map specialisation. I implemented our ideas for ndslice iterators and fields.

9il avatar Mar 06 '17 13:03 9il

To work with stack special until function was added. Its idea is similar to opApply concept. If a predicate always returns false then until just iterates all elements in a stack. stack + until does not require additional conditions to iterate all elements comparing with chain + Range API.

But that's a different design than what you discussed previously.

Citation: chain performance problem is if condition too. It can be solved by introducing a generic analog of opApply

9il avatar Mar 06 '17 13:03 9il

@9il yah the relative tradeoffs of internal and external iteration are well trodden ground. A good read: http://gafter.blogspot.com/2007/07/internal-versus-external-iterators.html (and of course the GoF book).

@wilzbach phobos uses external iteration, but you'd do good here to improve some cases by having orElse implement each like this:

void each(alias fun)()
{
     if (useElse) fun(other);
     else source.each!fun;
}

This composes properly and makes for a simple pass through everything. Thx!

andralex avatar Mar 08 '17 13:03 andralex

ping @wilzbach

andralex avatar Mar 18 '17 07:03 andralex

I wonder whether or not a coalesce function would be more useful.

trikko avatar Mar 23 '17 15:03 trikko

@trikko what would be the semantics of coalesce? thx!

andralex avatar Mar 23 '17 20:03 andralex

Just like in SQL. coalesce(a, b, c, d, ...) select the first non-null. I would select a.front if a is a range or only(a).front if not.

trikko avatar Mar 23 '17 20:03 trikko

(we can force the first argument to be a range, to static check types of other args)

trikko avatar Mar 23 '17 20:03 trikko

Isn't this rather like chain?

dnadlinger avatar Mar 23 '17 20:03 dnadlinger

@klickverbot it's a conditional chain

andralex avatar Mar 23 '17 20:03 andralex

More like chain(…).front with automatic application of only I suppose, yes. (I was referring to the more general coalesce; trying to figure out where the design sweet spot in terms of simplicity versus functionality lies.)

dnadlinger avatar Mar 23 '17 20:03 dnadlinger

Isn't coalesce just a orelse with more than two args?

trikko avatar Mar 23 '17 21:03 trikko

I'm certainly no expert in D, but is there too much wrong with the following implementation of orElse?:

auto orElse(Range, T)(Range r, T defaultValue) if (isInputRange!Range && is(T : ElementType!Range)) {
  return r.empty ? r.chain([defaultValue]) : r.chain((T[]).init);
}

But you lose a few things so maybe that's not a viable alternative. From the unittests you defined, the following fails:

  • slicing doesn't work on the returned orElse range, so your tests on infinite ranges fails with Error: no [] operator overload for type Result
  • Your hasSwappableElements, hasLvalueElements, and hasAssignableElements fail
  • The @nogc tests fail.

So chain doesn't seem to give you a range that's similar to what you put in. The docs say "If all input ranges offer random access and length, Chain offers them as well."

Maybe an alternative is to have orElse implemented with chain, and "fix" chain to offer opSlice if all input ranges have slicing, etc? Also in the implementation of chain I see an attempt to handle hasSwappableElements but also a comment that says @@@BUG@@@ (not sure what the bug is though)

PS: first post here for me in D's github, so I claim the right to ignorance on D's internals and open source directions incase the above is just way too silly ;)

Edit:

Wait or isn't this just: choose(r.empty, [defaultValue], r) ?

aliak00 avatar Jan 01 '18 23:01 aliak00

Thanks for your pull request, @wilzbach!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#5154"

dlang-bot avatar Apr 06 '18 15:04 dlang-bot