dlang.org
dlang.org copied to clipboard
Improve pure functions spec
I've redone the pure functions section to improve definitions. I'm aiming for a narrow and precise definition of strongly pure functions; following that, aggressive optimizations can be applied. cc @WalterBright @tgehr @dnadlinger @JohanEngelen @redstar
Thanks for your pull request, @andralex!
Bugzilla references
Your PR doesn't reference any Bugzilla issue.
If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
N.B.: this is best viewed as a split diff.
Searching for trailing whitespace
./spec/function.dd:411:$(P A pure function that passes all tests for strongly pure functions except (4) is called a
I think there may be the idea that strong and weak purity mean "can optimize" and "cannot optimize", but this is not true. (For example, a weakly pure function that may only modify data that is not accessed later can be elided. There can also be optimizations based on pure functions accessing only const
arguments.) I believe however that this is what probably motivated the special case for void
return.
I have no idea why there would be a special case for functions with zero arguments. This does not exist now and it contradicts existing compiler behavior:
int[] foo()pure{
return [1,2,3];
}
void main(){
immutable a=foo(); // ok
}
Also, I think it is a bit weird that the spec defines "weakly pure" and then "strongly pure" as "pure but not weakly pure".
I think there may be the idea that strong and weak purity mean "can optimize" and "cannot optimize", but this is not true. (For example, a weakly pure function that may only modify data that is not accessed later can be elided. There can also be optimizations based on pure functions accessing only const arguments.)
I'm trying to define strong purity as "definitely have freedom to optimize in functional tradition, even when definition is not available". Weak purity is "may or may not depending on information at hand".
Also: the weakly pure definition is more conservative than it could be. Consider:
pure void fun(int input, out int output);
This should be categorized as a pure function, because it's just an awkward way of writing:
pure int fun(int input);
By this PR, however, it's categorized as weakly pure. Making it strongly pure would complicate the definition while not allowing additional useful cases.
I have no idea why there would be a special case for functions with zero arguments. This does not exist now and it contradicts existing compiler behavior:
int[] foo()pure{
return [1,2,3];
}
void main(){
immutable a=foo(); // ok
}
Can you please explain that further? By this PR the function would need to return fresh memory every time, which is the right thing to do.
Also, I think it is a bit weird that the spec defines "weakly pure" and then "strongly pure" as "pure but not weakly pure".
What would be a better way?
The way I went for maximum clarity was to define weakly pure first. Then to make clear that a function can be either weakly pure or strongly pure (but not neither or both), I mention that a strongly pure function is a pure function that is not weakly pure. THEN just to make things crystal clear the doc goes ahead and enumerates the conditions for strong purity.
Are you thinking it's better to define strongly pure first and then just say "if it's pure but not strongly pure, it's weakly pure"?
@tgehr Consider:
int[] foo()pure{
return [1,2,3];
}
void main(){
immutable a=foo(); // ok
auto b=foo();
assert(b == a);
assert(b !is a);
}
This currently passes, which is good. If foo()
were strongly pure it would return the same array.
Direct link to html for ease of viewing: http://dtest.dlang.io/artifact/website-008142779c13e1c98c82d000287b2780f73fb9eb-2c9b19a3e85d4bd5b9ca85d217459224/web/spec/function.html#pure-functions
I think there may be the idea that strong and weak purity mean "can optimize" and "cannot optimize", but this is not true. (For example, a weakly pure function that may only modify data that is not accessed later can be elided. There can also be optimizations based on pure functions accessing only const arguments.)
I'm trying to define strong purity as "definitely have freedom to optimize in functional tradition, even when definition is not available".
Besides not being necessary for that, it is also not sufficient (with the current language definition). E.g., Haskell has non-deterministic exception semantics, to allow more rewrites. Furthermore, Haskell does not expose reference identities for immutable values.
Weak purity is "may or may not depending on information at hand". ...
Strong vs weak purity isn't really about optimizations. I think it's mostly a marketing thing to avoid the claim that D does not have "true" functional purity (which it really does not because e.g., you can't have an immutable list data type with value semantics due to the is
operator).
The important distinction is between what you call weakly pure functions and pure factory functions, because it affects implicit conversions. Currently, the pull request says pure factory functions must have at least one argument.
Also: the weakly pure definition is more conservative than it could be. Consider:
pure void fun(int input, out int output);
This should be categorized as a pure function, because it's just an awkward way of writing:
pure int fun(int input);
By this PR, however, it's categorized as weakly pure. Making it strongly pure would complicate the definition while not allowing additional useful cases.
It's weakly pure because it modifies a parameter by reference. That does not mean the compiler should not optimize fun(input,output1); fun(input,output2);
to fun(input,output1); output2=output1;
I have no idea why there would be a special case for functions with zero arguments. This does not exist now and it contradicts existing compiler behavior:
int[] foo()pure{ return [1,2,3]; } void main(){ immutable a=foo(); // ok }
Can you please explain that further? By this PR the function would need to return fresh memory every time, which is the right thing to do.
By this PR the implicit conversion to immutable should not compile because foo
has no arguments hence is not a pure factory function. Also, I think that's not "the right thing to do". Guaranteeing to preserve reference identities for immutable
references is costly.
Also, I think it is a bit weird that the spec defines "weakly pure" and then "strongly pure" as "pure but not weakly pure".
What would be a better way? ...
Are you thinking it's better to define strongly pure first and then just say "if it's pure but not strongly pure, it's weakly pure"?
Yes, I think it is, but of course that's fully up to you, as it does not influence correctness.
@tgehr Consider:
int[] foo()pure{ return [1,2,3]; } void main(){ immutable a=foo(); // ok auto b=foo(); assert(b == a); assert(b !is a); }
This currently passes, which is good. If
foo()
were strongly pure it would return the same array.
"Might", not "would". Also, you are justifying why you think strongly pure functions should not have indirections in their return values, not why they must have arguments. It really does not make sense to require them to have arguments. (The single argument could be of an empty struct type.)
Strong and weak purity isn't really about optimizations. I think it's mostly a marketing thing to avoid the claim that D does not have "true" functional purity (which it really does not because e.g., you can't have an immutable list data type with value semantics due to the is operator).
I'm not that concerned about marketing as much as about making decisions that keep things simple and meaningful. In that sense allowing optimizations is a litmus test - if we can't do it we might have taken the wrong turn someplace. Also I don't want to define pure
so as to lock us away from __metadata
, or __metadata
to lock us away from good optimizations of pure
.
I think I have a response to the paren - revised version upcoming which restricts what "equivalent parameters" means.
@tgehr eliminated the no-args requirement in the upcoming revision
@tgehr followed your advice by placing definition of strongly pure first. Then made weakly pure definition as the complement. Things seem a lil cleaner. Thx.
Strong and weak purity isn't really about optimizations. I think it's mostly a marketing thing to avoid the claim that D does not have "true" functional purity (which it really does not because e.g., you can't have an immutable list data type with value semantics due to the is operator).
I'm not that concerned about marketing as much as about making decisions that keep things simple and meaningful.
I was concerned about conflating strongly pure, which originated as a marketing concept used to justify weak purity, with language semantics. The PR makes more sense with your latest edits. (Or I missed the part restricting reordering before. I have no way to tell as the previous version disappeared after a force push.)
In that sense allowing optimizations is a litmus test - if we can't do it we might have taken the wrong turn someplace.
That's fine, as long as it is with an understanding that the set of specified optimizations is a small subset of what is possible, and not a comprehensive list.
Also I don't want to define
pure
so as to lock us away from__metadata
, or__metadata
to lock us away from good optimizations ofpure
. ...
Neither do I, but that might make it necessary to specify the complete set of allowed rewrites. (Explicitly or implicitly.)
Some possible bad outcomes that I really want to avoid:
-
void
returns become optimization blockers. - weakly pure functions become optimization blockers.
-
__metadata
in arguments become optimization blockers, turning off purity-based optimizations in the presence of lazy evaluation or reference counting. -
__metadata
in arguments implies at most weak purity.
I was concerned about conflating strongly pure, which originated as a marketing concept used to justify weak purity, with language semantics. The PR makes more sense with your latest edits. (Or I missed the part restricting reordering before. I have no way to tell as the previous version disappeared after a force push.)
Yeah, I switched to no-amend commits going forward.
In that sense allowing optimizations is a litmus test - if we can't do it we might have taken the wrong turn someplace.
That's fine, as long as it is with an understanding that the set of specified optimizations is a small subset of what is possible, and not a comprehensive list.
Cool!
Also I don't want to define
pure
so as to lock us away from__metadata
, or__metadata
to lock us away from good optimizations ofpure
. ...Neither do I, but that might make it necessary to specify the complete set of allowed rewrites. (Explicitly or implicitly.)
That would be acceptable. And don't forget - fewer is not a disaster.
Some possible bad outcomes that I really want to avoid:
Allow me to put my pointy-haired manager hat (or wig) by interspersing the top-level view with these:
void
returns become optimization blockers.
Very low impact. Pure void functions will be very rare if any, and don't forget we currently do zero optimizations based on purity. Yet nobody holds a gun to our head.
- weakly pure functions become optimization blockers.
Low impact.
__metadata
in arguments become optimization blockers, turning off purity-based optimizations in the presence of lazy evaluation or reference counting.
Low impact.
__metadata
in arguments implies at most weak purity.
Low impact.
What is high impact:
- Reference counted data structures usable with
immutable
and inpure
functions - Lazy initialization of members
These are what matters. I would very happy give away any and all items on your list to get these two. And it would be great if you did, too.
What is high impact:
* Reference counted data structures usable with `immutable` and in `pure` functions * Lazy initialization of members
Why not take advantage of purity to temporarily disable ref counting knowing at the end of the function that there will be no leaks. Otherwise purity means many things among which is free scalability in a multithreaded setup and some hidden detail like ref counting isn't supposed to affect that.
@SSoulaimane Sounds intriguing, but there's a ton of detail you'd need to bring. How would this work?
pure rcstring fun(rcstring a, rcstring b) { a ~= 'x'; return a ~ b; }
... Also I don't want to define
pure
so as to lock us away from__metadata
, or__metadata
to lock us away from good optimizations ofpure
. ...Neither do I, but that might make it necessary to specify the complete set of allowed rewrites. (Explicitly or implicitly.)
That would be acceptable. And don't forget - fewer is not a disaster. ...
The point of the rewrites is to define the meaning of pure
and immutable
.
Some possible bad outcomes that I really want to avoid:
Allow me to put my pointy-haired manager hat (or wig) by interspersing the top-level view with these: ...
It was you who said: "Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure."
Pure void functions will be very rare if any, and don't forget we currently do zero optimizations based on purity. Yet nobody holds a gun to our head. ...
That's not the point. Currently, pure
and immutable
have well-defined meanings from which such optimizations can be derived (note that pure
function elision can't even be derived if we require pure
functions to preserve non-termination). With __mutable
this goes out of the window, so you need an alternative way to specify what they mean. I was aiming for a state where we don't lose what we already have with pure
and immutable
.
What is high impact:
- Reference counted data structures usable with
immutable
and inpure
functions
I guess you mean nominally immutable
. I don't really see the point of that. It's rare to want to use the same complex data structure as both mutable
and immutable
, and immutable
structs and class references can't be reassigned, limiting their usefulness. Furthermore, it can make sense to store mutable data within a persistent data structure.
- Lazy initialization of members ...
I can do lazy initialization of members today, even within a weakly pure function.
These are what matters.
I thought you wanted to support general manual memory management within pure
functions, not just reference counting?
I would very happy give away any and all items on your list to get these two. And it would be great if you did, too.
I'm just not very excited to have merely nominal immutability in the type system, and I am certainly not the only one. It means whenever there is an immutable
in the code, I have to worry about whether or not it is a lie. Libraries will require const
arguments in the misguided name of "const
correctness", arguing that I should just use __mutable
fields if I really need to do some mutation in a callback. If I want to avoid the synchronization overhead, then I will need to make sure nobody can create an immutable
instance of my type. At least right now, function signatures are honest. I don't really see the point of watering down attributes such that they can be legally slapped even on code that does not behave the way the attribute would suggest, not even logically.
It was you who said: "Also I don't want to define pure so as to lock us away from __metadata, or __metadata to lock us away from good optimizations of pure."
Yes, but that's not a hard requirement. In fact if we drop strongly pure entirely and stay with weakly pure that's acceptable if there's no other way to get work done on reference counting.
Pure void functions will be very rare if any, and don't forget we currently do zero optimizations based on purity. Yet nobody holds a gun to our head. ...
That's not the point. Currently,
pure
andimmutable
have well-defined meanings from which such optimizations can be derived (note thatpure
function elision can't even be derived if we requirepure
functions to preserve non-termination). With__mutable
this goes out of the window, so you need an alternative way to specify what they mean. I was aiming for a state where we don't lose what we already have withpure
andimmutable
.
Maybe we have too much with pure
and immutable
, and of low-impact. Like, the ability to do optimizations that have not been done in a decade. What we need is high impact: reference counting without compiler magic.
- Reference counted data structures usable with
immutable
and inpure
functionsI guess you mean nominally
immutable
. I don't really see the point of that. It's rare to want to use the same complex data structure as bothmutable
andimmutable
, andimmutable
structs and class references can't be reassigned, limiting their usefulness. Furthermore, it can make sense to store mutable data within a persistent data structure.
This might be the case with complex data structures, but it is not with strings and slices. Currently string
and generally T[]
work nicely in pure functions and in conjunction with immutable. A simple rephrasing of the challenge is: design a type rcstring
and a type Slice!T
that do what string
and T[]
do, just that they manage their own memory instead of relying on the tracing collector.
Can you lead such an effort?
- Lazy initialization of members ...
I can do lazy initialization of members today, even within a weakly pure function.
Not in an immutable
or const
object. This is the problem, and a very real one.
These are what matters.
I thought you wanted to support general manual memory management within
pure
functions, not just reference counting?
I'd be happy with just reference counting if that's all that's possible.
I would very happy give away any and all items on your list to get these two. And it would be great if you did, too.
I'm just not very excited to have merely nominal immutability in the type system, and I am certainly not the only one. It means whenever there is an
immutable
in the code, I have to worry about whether or not it is a lie.
NO it's not a lie! The point is to allow controlled use of metadata that is transparent to the user of the immutable data structure.
Is tracing GC a lie? Because it does make mutable memory into immutable memory (construction) and immutable memory back into modifiable memory (reclamation). It is also easy to circumvent in any number of ways. Yet nobody is spending nights losing sleep over it.
Libraries will require
const
arguments in the misguided name of "const
correctness", arguing that I should just use__mutable
fields if I really need to do some mutation in a callback.
That is working just fine for C++. A good success story to learn from.
If I want to avoid the synchronization overhead, then I will need to make sure nobody can create an
immutable
instance of my type.
I don't understand this.
At least right now, function signatures are honest. I don't really see the point of watering down attributes such that they can be legally slapped even on code that does not behave the way the attribute would suggest, not even logically.
I understand, and I don't want to be there either. But I also reckon that without workable reference counting we're dead in the water.
Sadly we have re-entered an unhelpful pattern.
-
I try to break the stalemate
-
I don't know what I am doing, and both you and I know I don't know what I'm doing.
-
You rightly poke holes in my proposal, pointing out that I'm making something worse and overall that I don't know what I'm doing.
-
I give up and we're back to where we were. How can I not? You know what you're doing, and your counterarguments are correct.
-
There is no progress - no reference counting for the next ten years.
How can we get from this pattern to a pattern whereby you actually do steer things to the positive? What steps can we get to get reference counted data structures while at the same time preserve most of the good qualities of pure
and immutable
? That's the real challenge, not to say it can't be done. My mailman can tell me it can't be done. We need your good skill put to work in the right direction.
You can't elide strongly pure calls to zero, because they can throw exceptions, it must be called at least once. So void function is fine as strongly pure, a use case for it is data validation. A void function that does nothing - that's low impact.
Maybe you can reorder two pure nothrow functions between each other, but not anything else, again because exceptions.
Lazy initialization doesn't seem to conflict with purity. Reference counting can be done as weakly pure:
void* addRef() pure immutable
{
_counter++;
return null;
}
Also postblit should be weakly pure too.
new
and malloc
are special cases and should be in the list of special cases.
You can't elide strongly pure calls to zero, because they can throw exceptions, it must be called at least once.
Good insight, thanks. Can throw errors or abort the program even if nothrow
. I can't believe I'm saying this, but that's a relief :o).
Lazy initialization doesn't seem to conflict with purity. Reference counting can be done as weakly pure:
void* addRef() pure immutable { _counter++; return null; }
You can't update the counter of an immutable structure. (It's actually a pointer to a counter.)
Also postblit should be weakly pure too.
Good news - we don't need to worry about postblit :o).
new
andmalloc
are special cases and should be in the list of special cases.
Yes to new
, no to malloc
.
Eliminated the void
return test.
@andralex I'd say just do the ref counting. escape the type system. improve later.
@SSoulaimane we have that already... https://github.com/dlang/druntime/pull/2348
Mutable data can be stored like this:
struct __mutable(T)
{
private size_t _ref;
this(T val){ _ref=cast(size_t)val; }
T unwrap() const
{
return cast(T)_ref;
}
}
struct A
{
__mutable!(int*) rc;
this(int) immutable
{
rc=new int;
}
int inc() immutable
{
return ++*rc.unwrap;
}
}
int main()
{
immutable A b=0;
assert(b.inc==1);
assert(b.inc==2);
return 0;
}