phobos
phobos copied to clipboard
Fix Issue 15798 - std.algorithm.mutation.copy takes target by value
copy is designed in such a way that passing in a generic output range as the target value makes no sense, as copy is supposed to return the "remainder" of target. As such, its inputs should be restricted to inputs that are assignable input ranges or slices.
Consider the following from the stackoverflow example
import std.stdio;
import std.digest.digest;
import std.digest.md;
import std.algorithm;
void main() {
string s = "Hello!\n";
auto d1 = makeDigest!MD5;
auto d2 = makeDigest!MD5;
foreach (ubyte b; s) {
d1.put(b);
}
s.copy(d2);
writeln(digest!MD5(s).toHexString);
writeln(d1.finish().toHexString);
writeln(d2.finish().toHexString);
}
This outputs
E134CED312B3511D88943D57CCD70C83
E134CED312B3511D88943D57CCD70C83
D41D8CD98F00B204E9800998ECF8427E
Because copy takes the target by value. What the user should have done is use std.range.primitives.put to copy an input range to an output range which is taken by reference.
This change adds a deprecation message that alerts the user. Any usage of copy in this way is very likely deprecating broken code.
Thanks for your pull request, @JackStouffer!
Bugzilla references
| Auto-close | Bugzilla | Description |
|---|---|---|
| ✓ | 15798 | std.algorithm.mutation.copy takes target by value |
@CyberShadow In this specific instance you can call lockingTextWriter.put. But it's better to encourage people to use the global put as it takes care of the type of use cases where output ranges only define a char accepting put, and a string is passed.
I get that using std.range.put isn't pretty, but it is more correct. This code worked by chance and not by design.
But it's not UFCS-able :(
I get that using std.range.put isn't pretty, But it's not UFCS-able :(
Wasn't this example one of the few prime examples of how elegant D code can look like? :disappointed:
UFCS is one of the more beautiful parts of D. Even though the code worked by chance, it saddens me that code like this now has to become uglier.
I often try to structure my code to improve its UFCS-ability, e.g. [1] [2]. I remember using this very pattern (for copying a range to a file) some time recently as well.
There is a solution here: define some function X (we can bikeshed this) that just forwards to put, but because it's not called put we don't have to worry about name collision with the method of the type. That way it can be used in UFCS chains.
Yep, + invert args. I guess .reverseArgs!put will work now, but it's not really idiomatic. Anyway not a blocker I guess.
I'm not 100% convinced this is the right path. "it should only work for arrays" seems short-sighted. It should only work for output ranges that don't copy their state by value. However, we have no mechanism to test for this. Even if we made it a rule that output ranges must be reference types, there's no static way to check for this.
Is there no other path forward here?
@schveiguy it also works for ranges with assignable values a-la the retro example with copy.
it also works for ranges with assignable values
That's not necessarily a requirement:
struct HasAssignableElements
{
int front;
void popFront() {front = 0;}
enum empty = false;
}
@schveiguy I don't understand what that's supposed to illustrate. This would be treated as an output range by copy before this change and would discard all the values it's given except the last.
What I'm trying to say is, I don't think there's a static test that accepts all ranges that would be valid for copy and rejects all ranges that would be invalid.
For an example of something that should be fine with copy but rejected after your change (I think):
struct List(T)
{
T value;
List *next;
}
struct ListRange(T) // should be OK to use with copy
{
List!T *cur;
ref T front() { return cur.value; }
void popFront() { cur = cur.next; }
bool empty() { return cur == null; }
}
Wait, I'm misunderstanding your tests. You are looking at ranges that don't have assignable elements, are input ranges, but are also output ranges. This... seems very odd.
@schveiguy Specifically, I'm disallowing all TargetRanges which are output ranges but are also input ranges, because according to put any input range with assignable elements is an output range.
I'm trying to limit the deprecation to those output ranges which are put based.
OK, I am confusing the TargetRange and the SourceRange checks. Regrouping...
I'm focusing on the wrong thing. It's not valid input ranges that can be output ranges that cause problems, it's the output-only ranges that should also work. LockingTextWriter should work with copy, even if passed by value, because you aren't discarding the data. This is going to break lots of code that uses i/o via output ranges.
The real issue here is that copy needs to know whether the target is putting the data outside itself (and therefore can be passed by value). A crude test may be whether it has any pointers in it. But that still may not be enough, as you don't know if the pointers relate to the data itself, or something else (an unrelated string or something). As I said earlier, I don't think we can create such a test, unless we have a convention (a-la save, but that hasn't been stellar either).
The problem with this fix is that it purports to have such a test, but the test isn't good enough. This means you will be rejecting valid calls (such as with LockingTextWriter) and causing a different kind of confusion (why can't I call copy with XYZ?).
I wonder if checking for purity could help in some way...
@schveiguy I agree, it's not a perfect solution. There's no way to know a-priori if a output range is properly copy-able by value. But, the way we've figured out how to get around the problem is to always use ref when passing them around. It sidesteps the whole issue.
Like I noted above, LockingTextWriter just happens to work. There are many other forms of output ranges which fail silently with copy as it exists today. Further, copy's semantics are designed around the idea of array-like inputs (all of its tests using arrays at the bottom isn't an accident IMO). Most notably, the concept of copy returning the "unfilled" portion of the output range is not applicable to put based output ranges.
the way we've figured out how to get around the problem is to always use ref when passing them around
What about taking the second overload (i.e. output ranges that are not input ranges with assignable values) by auto ref? That would allow the "bad" cases that actually work (such as LockingTextWriter) and fix copy for the digest cases.
What about taking the second overload by
auto ref?
My problem with that is you still have the issue that auto ref will allow users to do the wrong thing and many output ranges will fail silently. I'd rather not give the user the chance to make a mistake, if possible.
Ping @jmdavis. You had a comment https://github.com/dlang/phobos/pull/5991#discussion_r159545584 on my output range toString PR which I think is relevant here, so I'd like to get your opinion.
many output ranges will fail silently
I don't think there are many output ranges that have this problem. Most reference the data anyway, or are lvalues.
I am on the side of not breaking working, valid, reasonable code like the example stdin.byLineCopy(Yes.keepTerminator).array.sort.copy(stdout.lockingTextWriter).
have you considered changing copy such that it takes its target by auto ref?
We can't do that completely, because that changes the semantics of copy with forward ranges that are input ranges with assignable elements:
auto buffer = new int[100];
iota(15).copy(buffer);
writeln(buffer[0 .. 15]);
Currently, this will print the numbers from 0 to 14. If copy is auto-ref, it will print 15 zeros.
I think we can use @JackStouffer's identification of the types of ranges that copy has problems with in this PR and just make THAT overload auto-ref. See https://github.com/dlang/phobos/pull/6039#issuecomment-358125018
As no one pointed this out before - Jenkins failure is at tools:
changed.d(122): Deprecation: function std.algorithm.mutation.copy!(MapResult!(to, FilterResult!(__lambda5, Splitter!(cast(Flag)false, char[], Wrapper))), Appender!(int[])).copy is deprecated - std.algorithm.mutation.copy to non-arrays should be replaced with std.range.primitives.put
https://github.com/dlang/tools/blob/master/changed.d#L122
@schveiguy @andralex @CyberShadow Removed the deprecation and made the new overload take the output range by auto ref. For the new overload, I also made it return void to avoid (heh) confusion with returning "unfilled" portions of output ranges when that doesn't apply.
I like this direction better.
I have considered further that it's possible an output range accepted by reference may break some code. Namely an output range that does put elements into an external location, but isn't expected to affect the lvalue that was used.
It might be possible to deprecate this functionality, but it might be too clever. If you pass in an lvalue by reference, you can pragma(msg, "Warning, in version 2.0XX, this call will accept the target by reference, and affect the original"), and then make a copy before using it.
I would like to hear from @andralex about the implications of using auto ref here, particularly surrounding this scenario: https://github.com/dlang/phobos/pull/6039#issuecomment-358347576
But other than that, LGTM.
Fixed
Ping @andralex