dlang.org icon indicating copy to clipboard operation
dlang.org copied to clipboard

Improve pure functions spec

Open andralex opened this issue 5 years ago • 33 comments

I've redone the pure functions section to improve definitions. I'm aiming for a narrow and precise definition of strongly pure functions; following that, aggressive optimizations can be applied. cc @WalterBright @tgehr @dnadlinger @JohanEngelen @redstar

andralex avatar Apr 18 '19 04:04 andralex

Thanks for your pull request, @andralex!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

dlang-bot avatar Apr 18 '19 04:04 dlang-bot

N.B.: this is best viewed as a split diff.

andralex avatar Apr 18 '19 04:04 andralex

Searching for trailing whitespace
./spec/function.dd:411:$(P A pure function that passes all tests for strongly pure functions except (4) is called a 

thewilsonator avatar Apr 18 '19 05:04 thewilsonator

I think there may be the idea that strong and weak purity mean "can optimize" and "cannot optimize", but this is not true. (For example, a weakly pure function that may only modify data that is not accessed later can be elided. There can also be optimizations based on pure functions accessing only const arguments.) I believe however that this is what probably motivated the special case for void return.

I have no idea why there would be a special case for functions with zero arguments. This does not exist now and it contradicts existing compiler behavior:

int[] foo()pure{
	return [1,2,3];
}
void main(){
	immutable a=foo(); // ok
}

tgehr avatar Apr 18 '19 11:04 tgehr

Also, I think it is a bit weird that the spec defines "weakly pure" and then "strongly pure" as "pure but not weakly pure".

tgehr avatar Apr 18 '19 11:04 tgehr

I think there may be the idea that strong and weak purity mean "can optimize" and "cannot optimize", but this is not true. (For example, a weakly pure function that may only modify data that is not accessed later can be elided. There can also be optimizations based on pure functions accessing only const arguments.)

I'm trying to define strong purity as "definitely have freedom to optimize in functional tradition, even when definition is not available". Weak purity is "may or may not depending on information at hand".

Also: the weakly pure definition is more conservative than it could be. Consider:

pure void fun(int input, out int output);

This should be categorized as a pure function, because it's just an awkward way of writing:

pure int fun(int input);

By this PR, however, it's categorized as weakly pure. Making it strongly pure would complicate the definition while not allowing additional useful cases.

andralex avatar Apr 18 '19 12:04 andralex

I have no idea why there would be a special case for functions with zero arguments. This does not exist now and it contradicts existing compiler behavior:

int[] foo()pure{
	return [1,2,3];
}
void main(){
	immutable a=foo(); // ok
}

Can you please explain that further? By this PR the function would need to return fresh memory every time, which is the right thing to do.

andralex avatar Apr 18 '19 12:04 andralex

Also, I think it is a bit weird that the spec defines "weakly pure" and then "strongly pure" as "pure but not weakly pure".

What would be a better way?

The way I went for maximum clarity was to define weakly pure first. Then to make clear that a function can be either weakly pure or strongly pure (but not neither or both), I mention that a strongly pure function is a pure function that is not weakly pure. THEN just to make things crystal clear the doc goes ahead and enumerates the conditions for strong purity.

Are you thinking it's better to define strongly pure first and then just say "if it's pure but not strongly pure, it's weakly pure"?

andralex avatar Apr 18 '19 12:04 andralex

@tgehr Consider:

int[] foo()pure{
	return [1,2,3];
}
void main(){
	immutable a=foo(); // ok
    auto b=foo();
    assert(b == a);
    assert(b !is a);
}

This currently passes, which is good. If foo() were strongly pure it would return the same array.

andralex avatar Apr 18 '19 12:04 andralex

Direct link to html for ease of viewing: http://dtest.dlang.io/artifact/website-008142779c13e1c98c82d000287b2780f73fb9eb-2c9b19a3e85d4bd5b9ca85d217459224/web/spec/function.html#pure-functions

andralex avatar Apr 18 '19 15:04 andralex

I think there may be the idea that strong and weak purity mean "can optimize" and "cannot optimize", but this is not true. (For example, a weakly pure function that may only modify data that is not accessed later can be elided. There can also be optimizations based on pure functions accessing only const arguments.)

I'm trying to define strong purity as "definitely have freedom to optimize in functional tradition, even when definition is not available".

Besides not being necessary for that, it is also not sufficient (with the current language definition). E.g., Haskell has non-deterministic exception semantics, to allow more rewrites. Furthermore, Haskell does not expose reference identities for immutable values.

Weak purity is "may or may not depending on information at hand". ...

Strong vs weak purity isn't really about optimizations. I think it's mostly a marketing thing to avoid the claim that D does not have "true" functional purity (which it really does not because e.g., you can't have an immutable list data type with value semantics due to the is operator).

The important distinction is between what you call weakly pure functions and pure factory functions, because it affects implicit conversions. Currently, the pull request says pure factory functions must have at least one argument.

Also: the weakly pure definition is more conservative than it could be. Consider:

pure void fun(int input, out int output);

This should be categorized as a pure function, because it's just an awkward way of writing:

pure int fun(int input);

By this PR, however, it's categorized as weakly pure. Making it strongly pure would complicate the definition while not allowing additional useful cases.

It's weakly pure because it modifies a parameter by reference. That does not mean the compiler should not optimize fun(input,output1); fun(input,output2); to fun(input,output1); output2=output1;

tgehr avatar Apr 18 '19 19:04 tgehr

I have no idea why there would be a special case for functions with zero arguments. This does not exist now and it contradicts existing compiler behavior:

int[] foo()pure{
	return [1,2,3];
}
void main(){
	immutable a=foo(); // ok
}

Can you please explain that further? By this PR the function would need to return fresh memory every time, which is the right thing to do.

By this PR the implicit conversion to immutable should not compile because foo has no arguments hence is not a pure factory function. Also, I think that's not "the right thing to do". Guaranteeing to preserve reference identities for immutable references is costly.

tgehr avatar Apr 18 '19 20:04 tgehr

Also, I think it is a bit weird that the spec defines "weakly pure" and then "strongly pure" as "pure but not weakly pure".

What would be a better way? ...

Are you thinking it's better to define strongly pure first and then just say "if it's pure but not strongly pure, it's weakly pure"?

Yes, I think it is, but of course that's fully up to you, as it does not influence correctness.

tgehr avatar Apr 18 '19 20:04 tgehr

@tgehr Consider:

int[] foo()pure{
	return [1,2,3];
}
void main(){
    immutable a=foo(); // ok
    auto b=foo();
    assert(b == a);
    assert(b !is a);
}

This currently passes, which is good. If foo() were strongly pure it would return the same array.

"Might", not "would". Also, you are justifying why you think strongly pure functions should not have indirections in their return values, not why they must have arguments. It really does not make sense to require them to have arguments. (The single argument could be of an empty struct type.)

tgehr avatar Apr 18 '19 20:04 tgehr

Strong and weak purity isn't really about optimizations. I think it's mostly a marketing thing to avoid the claim that D does not have "true" functional purity (which it really does not because e.g., you can't have an immutable list data type with value semantics due to the is operator).

I'm not that concerned about marketing as much as about making decisions that keep things simple and meaningful. In that sense allowing optimizations is a litmus test - if we can't do it we might have taken the wrong turn someplace. Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure.

I think I have a response to the paren - revised version upcoming which restricts what "equivalent parameters" means.

andralex avatar Apr 18 '19 21:04 andralex

@tgehr eliminated the no-args requirement in the upcoming revision

andralex avatar Apr 18 '19 21:04 andralex

@tgehr followed your advice by placing definition of strongly pure first. Then made weakly pure definition as the complement. Things seem a lil cleaner. Thx.

andralex avatar Apr 18 '19 22:04 andralex

Strong and weak purity isn't really about optimizations. I think it's mostly a marketing thing to avoid the claim that D does not have "true" functional purity (which it really does not because e.g., you can't have an immutable list data type with value semantics due to the is operator).

I'm not that concerned about marketing as much as about making decisions that keep things simple and meaningful.

I was concerned about conflating strongly pure, which originated as a marketing concept used to justify weak purity, with language semantics. The PR makes more sense with your latest edits. (Or I missed the part restricting reordering before. I have no way to tell as the previous version disappeared after a force push.)

In that sense allowing optimizations is a litmus test - if we can't do it we might have taken the wrong turn someplace.

That's fine, as long as it is with an understanding that the set of specified optimizations is a small subset of what is possible, and not a comprehensive list.

Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure. ...

Neither do I, but that might make it necessary to specify the complete set of allowed rewrites. (Explicitly or implicitly.)

Some possible bad outcomes that I really want to avoid:

  • void returns become optimization blockers.
  • weakly pure functions become optimization blockers.
  • __metadata in arguments become optimization blockers, turning off purity-based optimizations in the presence of lazy evaluation or reference counting.
  • __metadata in arguments implies at most weak purity.

tgehr avatar Apr 18 '19 22:04 tgehr

I was concerned about conflating strongly pure, which originated as a marketing concept used to justify weak purity, with language semantics. The PR makes more sense with your latest edits. (Or I missed the part restricting reordering before. I have no way to tell as the previous version disappeared after a force push.)

Yeah, I switched to no-amend commits going forward.

In that sense allowing optimizations is a litmus test - if we can't do it we might have taken the wrong turn someplace.

That's fine, as long as it is with an understanding that the set of specified optimizations is a small subset of what is possible, and not a comprehensive list.

Cool!

Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure. ...

Neither do I, but that might make it necessary to specify the complete set of allowed rewrites. (Explicitly or implicitly.)

That would be acceptable. And don't forget - fewer is not a disaster.

Some possible bad outcomes that I really want to avoid:

Allow me to put my pointy-haired manager hat (or wig) by interspersing the top-level view with these:

  • void returns become optimization blockers.

Very low impact. Pure void functions will be very rare if any, and don't forget we currently do zero optimizations based on purity. Yet nobody holds a gun to our head.

  • weakly pure functions become optimization blockers.

Low impact.

  • __metadata in arguments become optimization blockers, turning off purity-based optimizations in the presence of lazy evaluation or reference counting.

Low impact.

  • __metadata in arguments implies at most weak purity.

Low impact.

What is high impact:

  • Reference counted data structures usable with immutable and in pure functions
  • Lazy initialization of members

These are what matters. I would very happy give away any and all items on your list to get these two. And it would be great if you did, too.

andralex avatar Apr 19 '19 00:04 andralex

What is high impact:

* Reference counted data structures usable with `immutable` and in `pure` functions

* Lazy initialization of members

Why not take advantage of purity to temporarily disable ref counting knowing at the end of the function that there will be no leaks. Otherwise purity means many things among which is free scalability in a multithreaded setup and some hidden detail like ref counting isn't supposed to affect that.

SSoulaimane avatar Apr 19 '19 11:04 SSoulaimane

@SSoulaimane Sounds intriguing, but there's a ton of detail you'd need to bring. How would this work?

pure rcstring fun(rcstring a, rcstring b) { a ~= 'x'; return a ~ b; }

andralex avatar Apr 19 '19 12:04 andralex

... Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure. ...

Neither do I, but that might make it necessary to specify the complete set of allowed rewrites. (Explicitly or implicitly.)

That would be acceptable. And don't forget - fewer is not a disaster. ...

The point of the rewrites is to define the meaning of pure and immutable.

Some possible bad outcomes that I really want to avoid:

Allow me to put my pointy-haired manager hat (or wig) by interspersing the top-level view with these: ...

It was you who said: "Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure."

Pure void functions will be very rare if any, and don't forget we currently do zero optimizations based on purity. Yet nobody holds a gun to our head. ...

That's not the point. Currently, pure and immutable have well-defined meanings from which such optimizations can be derived (note that pure function elision can't even be derived if we require pure functions to preserve non-termination). With __mutable this goes out of the window, so you need an alternative way to specify what they mean. I was aiming for a state where we don't lose what we already have with pure and immutable.

What is high impact:

  • Reference counted data structures usable with immutable and in pure functions

I guess you mean nominally immutable. I don't really see the point of that. It's rare to want to use the same complex data structure as both mutable and immutable, and immutable structs and class references can't be reassigned, limiting their usefulness. Furthermore, it can make sense to store mutable data within a persistent data structure.

  • Lazy initialization of members ...

I can do lazy initialization of members today, even within a weakly pure function.

These are what matters.

I thought you wanted to support general manual memory management within pure functions, not just reference counting?

I would very happy give away any and all items on your list to get these two. And it would be great if you did, too.

I'm just not very excited to have merely nominal immutability in the type system, and I am certainly not the only one. It means whenever there is an immutable in the code, I have to worry about whether or not it is a lie. Libraries will require const arguments in the misguided name of "const correctness", arguing that I should just use __mutable fields if I really need to do some mutation in a callback. If I want to avoid the synchronization overhead, then I will need to make sure nobody can create an immutable instance of my type. At least right now, function signatures are honest. I don't really see the point of watering down attributes such that they can be legally slapped even on code that does not behave the way the attribute would suggest, not even logically.

tgehr avatar Apr 19 '19 14:04 tgehr

It was you who said: "Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure."

Yes, but that's not a hard requirement. In fact if we drop strongly pure entirely and stay with weakly pure that's acceptable if there's no other way to get work done on reference counting.

Pure void functions will be very rare if any, and don't forget we currently do zero optimizations based on purity. Yet nobody holds a gun to our head. ...

That's not the point. Currently, pure and immutable have well-defined meanings from which such optimizations can be derived (note that pure function elision can't even be derived if we require pure functions to preserve non-termination). With __mutable this goes out of the window, so you need an alternative way to specify what they mean. I was aiming for a state where we don't lose what we already have with pure and immutable.

Maybe we have too much with pure and immutable, and of low-impact. Like, the ability to do optimizations that have not been done in a decade. What we need is high impact: reference counting without compiler magic.

  • Reference counted data structures usable with immutable and in pure functions

I guess you mean nominally immutable. I don't really see the point of that. It's rare to want to use the same complex data structure as both mutable and immutable, and immutable structs and class references can't be reassigned, limiting their usefulness. Furthermore, it can make sense to store mutable data within a persistent data structure.

This might be the case with complex data structures, but it is not with strings and slices. Currently string and generally T[] work nicely in pure functions and in conjunction with immutable. A simple rephrasing of the challenge is: design a type rcstring and a type Slice!T that do what string and T[] do, just that they manage their own memory instead of relying on the tracing collector.

Can you lead such an effort?

  • Lazy initialization of members ...

I can do lazy initialization of members today, even within a weakly pure function.

Not in an immutable or const object. This is the problem, and a very real one.

These are what matters.

I thought you wanted to support general manual memory management within pure functions, not just reference counting?

I'd be happy with just reference counting if that's all that's possible.

I would very happy give away any and all items on your list to get these two. And it would be great if you did, too.

I'm just not very excited to have merely nominal immutability in the type system, and I am certainly not the only one. It means whenever there is an immutable in the code, I have to worry about whether or not it is a lie.

NO it's not a lie! The point is to allow controlled use of metadata that is transparent to the user of the immutable data structure.

Is tracing GC a lie? Because it does make mutable memory into immutable memory (construction) and immutable memory back into modifiable memory (reclamation). It is also easy to circumvent in any number of ways. Yet nobody is spending nights losing sleep over it.

Libraries will require const arguments in the misguided name of "const correctness", arguing that I should just use __mutable fields if I really need to do some mutation in a callback.

That is working just fine for C++. A good success story to learn from.

If I want to avoid the synchronization overhead, then I will need to make sure nobody can create an immutable instance of my type.

I don't understand this.

At least right now, function signatures are honest. I don't really see the point of watering down attributes such that they can be legally slapped even on code that does not behave the way the attribute would suggest, not even logically.

I understand, and I don't want to be there either. But I also reckon that without workable reference counting we're dead in the water.

Sadly we have re-entered an unhelpful pattern.

  1. I try to break the stalemate

  2. I don't know what I am doing, and both you and I know I don't know what I'm doing.

  3. You rightly poke holes in my proposal, pointing out that I'm making something worse and overall that I don't know what I'm doing.

  4. I give up and we're back to where we were. How can I not? You know what you're doing, and your counterarguments are correct.

  5. There is no progress - no reference counting for the next ten years.

How can we get from this pattern to a pattern whereby you actually do steer things to the positive? What steps can we get to get reference counted data structures while at the same time preserve most of the good qualities of pure and immutable? That's the real challenge, not to say it can't be done. My mailman can tell me it can't be done. We need your good skill put to work in the right direction.

andralex avatar Apr 19 '19 15:04 andralex

You can't elide strongly pure calls to zero, because they can throw exceptions, it must be called at least once. So void function is fine as strongly pure, a use case for it is data validation. A void function that does nothing - that's low impact.

Maybe you can reorder two pure nothrow functions between each other, but not anything else, again because exceptions.

Lazy initialization doesn't seem to conflict with purity. Reference counting can be done as weakly pure:

void* addRef() pure immutable
{
  _counter++;
  return null;
}

Also postblit should be weakly pure too.

new and malloc are special cases and should be in the list of special cases.

anon17 avatar Apr 19 '19 15:04 anon17

You can't elide strongly pure calls to zero, because they can throw exceptions, it must be called at least once.

Good insight, thanks. Can throw errors or abort the program even if nothrow. I can't believe I'm saying this, but that's a relief :o).

andralex avatar Apr 19 '19 15:04 andralex

Lazy initialization doesn't seem to conflict with purity. Reference counting can be done as weakly pure:

void* addRef() pure immutable
{
  _counter++;
  return null;
}

You can't update the counter of an immutable structure. (It's actually a pointer to a counter.)

Also postblit should be weakly pure too.

Good news - we don't need to worry about postblit :o).

new and malloc are special cases and should be in the list of special cases.

Yes to new, no to malloc.

andralex avatar Apr 19 '19 15:04 andralex

Eliminated the void return test.

andralex avatar Apr 20 '19 17:04 andralex

@andralex I'd say just do the ref counting. escape the type system. improve later.

SSoulaimane avatar Apr 20 '19 18:04 SSoulaimane

@SSoulaimane we have that already... https://github.com/dlang/druntime/pull/2348

andralex avatar Apr 20 '19 18:04 andralex

Mutable data can be stored like this:

struct __mutable(T)
{
    private size_t _ref;
    this(T val){ _ref=cast(size_t)val; }
    T unwrap() const
    {
        return cast(T)_ref;
    }
}

struct A
{
    __mutable!(int*) rc;
    this(int) immutable
    {
        rc=new int;
    }
    int inc() immutable
    {
        return ++*rc.unwrap;
    }
}

int main()
{
    immutable A b=0;
    assert(b.inc==1);
    assert(b.inc==2);
    return 0;
}

anon17 avatar Apr 22 '19 11:04 anon17