phobos icon indicating copy to clipboard operation
phobos copied to clipboard

Fix Issue 15798 - std.algorithm.mutation.copy takes target by value

Open JackStouffer opened this issue 7 years ago • 33 comments

copy is designed in such a way that passing in a generic output range as the target value makes no sense, as copy is supposed to return the "remainder" of target. As such, its inputs should be restricted to inputs that are assignable input ranges or slices.

Consider the following from the stackoverflow example

import std.stdio;
import std.digest.digest;
import std.digest.md;
import std.algorithm;

void main() {
    string s = "Hello!\n";
    auto d1 = makeDigest!MD5;
    auto d2 = makeDigest!MD5;
    foreach (ubyte b; s) {
        d1.put(b);
    }
    s.copy(d2);
    writeln(digest!MD5(s).toHexString);
    writeln(d1.finish().toHexString);
    writeln(d2.finish().toHexString);
}

This outputs

E134CED312B3511D88943D57CCD70C83
E134CED312B3511D88943D57CCD70C83
D41D8CD98F00B204E9800998ECF8427E

Because copy takes the target by value. What the user should have done is use std.range.primitives.put to copy an input range to an output range which is taken by reference.

This change adds a deprecation message that alerts the user. Any usage of copy in this way is very likely deprecating broken code.

JackStouffer avatar Jan 16 '18 18:01 JackStouffer

Thanks for your pull request, @JackStouffer!

Bugzilla references

Auto-close Bugzilla Description
15798 std.algorithm.mutation.copy takes target by value

dlang-bot avatar Jan 16 '18 18:01 dlang-bot

@CyberShadow In this specific instance you can call lockingTextWriter.put. But it's better to encourage people to use the global put as it takes care of the type of use cases where output ranges only define a char accepting put, and a string is passed.

JackStouffer avatar Jan 16 '18 19:01 JackStouffer

I get that using std.range.put isn't pretty, but it is more correct. This code worked by chance and not by design.

JackStouffer avatar Jan 16 '18 19:01 JackStouffer

But it's not UFCS-able :(

CyberShadow avatar Jan 16 '18 19:01 CyberShadow

I get that using std.range.put isn't pretty, But it's not UFCS-able :(

Wasn't this example one of the few prime examples of how elegant D code can look like? :disappointed:

wilzbach avatar Jan 16 '18 19:01 wilzbach

UFCS is one of the more beautiful parts of D. Even though the code worked by chance, it saddens me that code like this now has to become uglier.

I often try to structure my code to improve its UFCS-ability, e.g. [1] [2]. I remember using this very pattern (for copying a range to a file) some time recently as well.

CyberShadow avatar Jan 16 '18 19:01 CyberShadow

There is a solution here: define some function X (we can bikeshed this) that just forwards to put, but because it's not called put we don't have to worry about name collision with the method of the type. That way it can be used in UFCS chains.

JackStouffer avatar Jan 16 '18 19:01 JackStouffer

Yep, + invert args. I guess .reverseArgs!put will work now, but it's not really idiomatic. Anyway not a blocker I guess.

CyberShadow avatar Jan 16 '18 19:01 CyberShadow

I'm not 100% convinced this is the right path. "it should only work for arrays" seems short-sighted. It should only work for output ranges that don't copy their state by value. However, we have no mechanism to test for this. Even if we made it a rule that output ranges must be reference types, there's no static way to check for this.

Is there no other path forward here?

schveiguy avatar Jan 16 '18 19:01 schveiguy

@schveiguy it also works for ranges with assignable values a-la the retro example with copy.

JackStouffer avatar Jan 16 '18 19:01 JackStouffer

it also works for ranges with assignable values

That's not necessarily a requirement:

struct HasAssignableElements
{
   int front;
   void popFront() {front = 0;}
   enum empty = false;
}

schveiguy avatar Jan 16 '18 20:01 schveiguy

@schveiguy I don't understand what that's supposed to illustrate. This would be treated as an output range by copy before this change and would discard all the values it's given except the last.

JackStouffer avatar Jan 16 '18 20:01 JackStouffer

What I'm trying to say is, I don't think there's a static test that accepts all ranges that would be valid for copy and rejects all ranges that would be invalid.

For an example of something that should be fine with copy but rejected after your change (I think):

struct List(T)
{
    T value;
    List *next;
}

struct ListRange(T) // should be OK to use with copy
{
    List!T *cur;
    ref T front() { return cur.value; }
    void popFront() { cur = cur.next; }
    bool empty() { return cur == null; }
}

schveiguy avatar Jan 16 '18 20:01 schveiguy

Wait, I'm misunderstanding your tests. You are looking at ranges that don't have assignable elements, are input ranges, but are also output ranges. This... seems very odd.

schveiguy avatar Jan 16 '18 20:01 schveiguy

@schveiguy Specifically, I'm disallowing all TargetRanges which are output ranges but are also input ranges, because according to put any input range with assignable elements is an output range.

I'm trying to limit the deprecation to those output ranges which are put based.

JackStouffer avatar Jan 16 '18 20:01 JackStouffer

OK, I am confusing the TargetRange and the SourceRange checks. Regrouping...

schveiguy avatar Jan 16 '18 20:01 schveiguy

I'm focusing on the wrong thing. It's not valid input ranges that can be output ranges that cause problems, it's the output-only ranges that should also work. LockingTextWriter should work with copy, even if passed by value, because you aren't discarding the data. This is going to break lots of code that uses i/o via output ranges.

The real issue here is that copy needs to know whether the target is putting the data outside itself (and therefore can be passed by value). A crude test may be whether it has any pointers in it. But that still may not be enough, as you don't know if the pointers relate to the data itself, or something else (an unrelated string or something). As I said earlier, I don't think we can create such a test, unless we have a convention (a-la save, but that hasn't been stellar either).

The problem with this fix is that it purports to have such a test, but the test isn't good enough. This means you will be rejecting valid calls (such as with LockingTextWriter) and causing a different kind of confusion (why can't I call copy with XYZ?).

schveiguy avatar Jan 16 '18 20:01 schveiguy

I wonder if checking for purity could help in some way...

schveiguy avatar Jan 16 '18 20:01 schveiguy

@schveiguy I agree, it's not a perfect solution. There's no way to know a-priori if a output range is properly copy-able by value. But, the way we've figured out how to get around the problem is to always use ref when passing them around. It sidesteps the whole issue.

Like I noted above, LockingTextWriter just happens to work. There are many other forms of output ranges which fail silently with copy as it exists today. Further, copy's semantics are designed around the idea of array-like inputs (all of its tests using arrays at the bottom isn't an accident IMO). Most notably, the concept of copy returning the "unfilled" portion of the output range is not applicable to put based output ranges.

JackStouffer avatar Jan 16 '18 21:01 JackStouffer

the way we've figured out how to get around the problem is to always use ref when passing them around

What about taking the second overload (i.e. output ranges that are not input ranges with assignable values) by auto ref? That would allow the "bad" cases that actually work (such as LockingTextWriter) and fix copy for the digest cases.

schveiguy avatar Jan 16 '18 22:01 schveiguy

What about taking the second overload by auto ref?

My problem with that is you still have the issue that auto ref will allow users to do the wrong thing and many output ranges will fail silently. I'd rather not give the user the chance to make a mistake, if possible.

JackStouffer avatar Jan 17 '18 01:01 JackStouffer

Ping @jmdavis. You had a comment https://github.com/dlang/phobos/pull/5991#discussion_r159545584 on my output range toString PR which I think is relevant here, so I'd like to get your opinion.

JackStouffer avatar Jan 17 '18 01:01 JackStouffer

many output ranges will fail silently

I don't think there are many output ranges that have this problem. Most reference the data anyway, or are lvalues.

I am on the side of not breaking working, valid, reasonable code like the example stdin.byLineCopy(Yes.keepTerminator).array.sort.copy(stdout.lockingTextWriter).

schveiguy avatar Jan 17 '18 02:01 schveiguy

have you considered changing copy such that it takes its target by auto ref?

We can't do that completely, because that changes the semantics of copy with forward ranges that are input ranges with assignable elements:

auto buffer = new int[100];
iota(15).copy(buffer);
writeln(buffer[0 .. 15]);

Currently, this will print the numbers from 0 to 14. If copy is auto-ref, it will print 15 zeros.

I think we can use @JackStouffer's identification of the types of ranges that copy has problems with in this PR and just make THAT overload auto-ref. See https://github.com/dlang/phobos/pull/6039#issuecomment-358125018

schveiguy avatar Jan 17 '18 02:01 schveiguy

As no one pointed this out before - Jenkins failure is at tools:

changed.d(122): Deprecation: function std.algorithm.mutation.copy!(MapResult!(to, FilterResult!(__lambda5, Splitter!(cast(Flag)false, char[], Wrapper))), Appender!(int[])).copy is deprecated - std.algorithm.mutation.copy to non-arrays should be replaced with std.range.primitives.put

https://github.com/dlang/tools/blob/master/changed.d#L122

wilzbach avatar Jan 17 '18 11:01 wilzbach

@schveiguy @andralex @CyberShadow Removed the deprecation and made the new overload take the output range by auto ref. For the new overload, I also made it return void to avoid (heh) confusion with returning "unfilled" portions of output ranges when that doesn't apply.

JackStouffer avatar Jan 17 '18 14:01 JackStouffer

I like this direction better.

I have considered further that it's possible an output range accepted by reference may break some code. Namely an output range that does put elements into an external location, but isn't expected to affect the lvalue that was used.

It might be possible to deprecate this functionality, but it might be too clever. If you pass in an lvalue by reference, you can pragma(msg, "Warning, in version 2.0XX, this call will accept the target by reference, and affect the original"), and then make a copy before using it.

schveiguy avatar Jan 17 '18 15:01 schveiguy

I would like to hear from @andralex about the implications of using auto ref here, particularly surrounding this scenario: https://github.com/dlang/phobos/pull/6039#issuecomment-358347576

But other than that, LGTM.

schveiguy avatar Jan 18 '18 14:01 schveiguy

Fixed

JackStouffer avatar Jan 19 '18 19:01 JackStouffer

Ping @andralex

JackStouffer avatar Jan 19 '18 23:01 JackStouffer