phobos icon indicating copy to clipboard operation
phobos copied to clipboard

Safe ref counted

Open dukc opened this issue 2 years ago • 65 comments

This is off Atila's #8101 . I wanted to create the PR against his fork but GitHub didn't allow that.

I tried to reproduce the safety bug @Geod24 found, but could not. It appears that language updates have automatically corrected that. In fact I had to add an additional @trusted wrapper to the constructor in question so that the language would not prevent it's use with scope data totally. I added a test to verify that the safety bug stays gone.

I also had the destructor to check that dip1000 is enabled, otherwise it now stays @system.

dukc avatar Jan 26 '22 20:01 dukc

Thanks for your pull request and interest in making D better, @dukc! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8368"

dlang-bot avatar Jan 26 '22 20:01 dlang-bot

Challenge accepted

import std.stdio;
import std.typecons;

struct Container
{
    ubyte[] data;
}

struct Blob
{
    ubyte[64] data;
}


void main ()
{
    auto ptr = getPtr();
    writeln(ptr);
    const save = ptr.dup;
    auto other = getPtr();
    assert(save == ptr);
}

ubyte[] getPtr ()
{
    Blob blob;
    RefCounted!Container rc;
    escape(rc.refCountedPayload(), blob);
    assert(rc.refCountedPayload().data !is null);
    return rc.refCountedPayload().data;
}

void escape (ref Container rc, ref Blob local)
{
    rc.data = local.data;
}

Compiles, runs, crashes. That might be more of a compiler bug than a RefCounted bug though.

Command used:

$ ./dmd/generated/osx/release/64/dmd -preview=dip1000 -run breakdip1000.d

Commits used:

$ git -C dmd show HEAD
commit 71ba3fa46221daa0539b03ed7e41f454bd6ce326
Author: Iain Buclaw <[email protected]>
Date:   Wed Jan 26 15:15:44 2022 +0100

    posix.mak: Invoke bash to run the bootstrap script (#13549)
    
$ git -C druntime show HEAD
commit 33511e263134530a5994a775b03a061ea3f1aa34
Author: Sönke Ludwig <[email protected]>
Date:   Mon Jan 24 17:30:16 2022 +0100

    Fix GetICMProfile signature

    See https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-geticmprofilew
    
$ git -C phobos show HEAD
commit 90d4336ccd96a096c26c393fe1c081c8fd09433e (HEAD, dukc/safe-ref-counted)
Author: Ate Eskola <[email protected]>
Date:   Wed Jan 26 21:43:02 2022 +0200

    Addressed the concerns in reviews

Geod24 avatar Jan 27 '22 04:01 Geod24

Thanks! Yeah, as you suspected it's a compiler bug that can be abused without Phobos at all:

@safe unittest //this compiles
{
    void escape (ref ubyte[] arr, ref ubyte[64] local)
    {
        arr = local;
    }

    @safe ubyte[] getPtr ()
    {
        ubyte[64] blob;
        ubyte[] arr;
        escape(arr, blob);
        return arr; // Ahh, just like bad old C days!
    }
}

Gotta report, if I don't find it already reported.

dukc avatar Jan 27 '22 21:01 dukc

Didn't find an existing issue, so submitted: https://issues.dlang.org/show_bug.cgi?id=22709

I have another possible vulnerability in mind, but It's late around here so I'll test it another day. I hope I won't forget what I'm having in mind.

dukc avatar Jan 27 '22 21:01 dukc

I also looked a bit deeper into the internals of RefCounted, and there are some issues that prevented me from exploiting some other features. For example, opAssign takes its argument by value, not ref. Given the use cases of RefCounted, I think that could lead to some very undesirable behavior, but I'll explore this once the low-hanging compiler bugs are fixed :)

Geod24 avatar Jan 28 '22 04:01 Geod24

Issue 22709 has been fixed: https://github.com/dlang/dmd/pull/13577

RazvanN7 avatar Jan 28 '22 11:01 RazvanN7

Tried to implement my plan to hack this implementation, along the lines of this:

@safe unittest
{
    @safe void assignCrap(RefCounted!(int*) countedRef)
    {
        int local;
        auto anotherRef = countedRef; // I was thinking that anotherRef has shorter lifetime than local...
        anotherRef.refCountedPayload() = &local; // ...so that this would compile
    }
}

The compiler does not fall for this, nor for some similar variations I tried. I THINK it is because the return attribute in refCountedPayload makes the reference to the payload to be of shorter lifetime than anotherRef, not the payload itself.

That does not mean that this is safe, just that my poor head is running out of cunning with this. Perhaps by using a struct payload and using some member function to play with the lifetimes of the variables it is still hackable.

dukc avatar Jan 28 '22 20:01 dukc

This is the problematic case:

@safe unittest
{
    auto rc = RefCounted!int(123);
    (ref int n) {
        destroy(rc);
        int oops = n; // use-after-free if allowed
    }(rc.refCountedPayload);
}

As far as I know, in order to ensure this can't happen in @safe code, either refCountedPayload or the destructor must be @system. Currently, it looks like they are both @safe (though I haven't tested it myself). If so, that means this PR is unsound.

pbackus avatar Jan 29 '22 02:01 pbackus

Good catch! That one compiles, and also a simpler variant of it (in my opinion) compiles:

@safe unittest
{
    auto rc = RefCounted!int(123);
    int* ptr = &rc.refCountedPayload();
    destroy(rc);
    int oops = *ptr;
}

I think the problem is allowing destruction of rc before end of the scope. The destructor is (as far as we have discovered) @safe if called at the end of the lifetime, but not otherwise. Meaning, this PR might be safe in @live but not otherwise :(.

Worse, I think this applies to about any pattern -dip1000 would enable:

@safe void abuse(Container)()
{ auto cont = Container([1,2,3]);
  scope ptr = &cont.front;
  destroy(cont);
  int oops = *ptr;
}

dukc avatar Jan 29 '22 08:01 dukc

Worse, I think this applies to about any pattern -dip1000 would enable:

There is one pattern that can work here, which is a callback-based API:

auto ref apply(alias callback, T)(auto ref RefCounted!T this_)
{
    // Ensures a reference is held until after `callback` returns
    auto hold = this_;
    // Explicit `scope` prevents `callback` from escaping a reference
    scope ptr = () @trusted { return &this_.refCountedPayload(); }();
    return callback(*ptr);
}

// Usage
@safe unittest
{
    auto rc = RefCounted!int(123);
    rc.apply!((ref n) {
        destroy(rc);
        int ok = n; // ok - kept alive by `hold`
    });
}

The downsides of this approach are that (a) the syntax isn't as nice, and (b) it requires an extra ref count increment/decrement at runtime.

I think the best we can do, without a real ownership/borrowing system in the language (@live doesn't count), is to provide both a @safe callback-based API for users willing to accept the runtime overhead, and a @system API that gives direct access, and puts the responsibility on users not to create dangling references. In other words: make refCountedPayload @system and add a @safe apply.

pbackus avatar Jan 29 '22 14:01 pbackus

@pbackus : What if someone calls destroy twice ?

Geod24 avatar Jan 31 '22 09:01 Geod24

@Geod24 The destructor should reset _refCounted to RefCountedStore.init to ensure that it is idempotent even if destroy!false is used.

pbackus avatar Jan 31 '22 13:01 pbackus

There is one pattern that can work here, which is a callback-based API:

Going to do it like that, when I find time for it.

Perhaps there is a way to get the refCountedPayload working in @safe. I'm thinking about making the destructor private, and defining an UDA at DRuntime, let's say noEarlyDestroy. destroy would be @system for types annotated with that attribute. What do you think, would that work? In any case, I'm not going to implement that in this PR.

dukc avatar Feb 02 '22 21:02 dukc

I'm thinking about making the destructor private, and defining an UDA at DRuntime, let's say noEarlyDestroy. destroy would be @system for types annotated with that attribute. What do you think, would that work?

A similar idea has been suggested before: https://issues.dlang.org/show_bug.cgi?id=21981

Currently, it would not work, because @safe can bypass private using __traits(getMember). Fixing that issue requires breaking existing code, so it will require a deprecation cycle at the very least.

pbackus avatar Feb 02 '22 22:02 pbackus

That feature of __traits(getMember) breaks @safety of pretty much everything anyway, so I think we're best of pretending it does not exist.

dukc avatar Feb 02 '22 22:02 dukc

Well, it's entirely possible at this point that the issue with __traits(getMember) will never be fixed. That's one of the motivations behind @system variables (DIP 1035): they're a way to block access from @safe code that doesn't rely on private.

Until and unless we have 100% confirmation that private will be made inviolable in @safe code, we should not commit to adding any new language features that rely on private for memory safety.

pbackus avatar Feb 02 '22 22:02 pbackus

Until and unless we have 100% confirmation that private will be made inviolable in @safe code, we should not commit to adding any new language features that rely on private for memory safety.

Yeah, since DIP1035 can fix most but not all problems of __traits(getmember), it is within possibility that W&A declare violating privacy via it is an official feature even in @safe, rather than a bug. I agree we need an official decision on that before trying that @noEarlyDestroy thing.

Perhaps a forum topic is in order. On the other hand if DIP1035 gets rejected (I would be disappointed) no discussion is needed IMO: that behaviour is unambiguously a bug.

dukc avatar Feb 04 '22 20:02 dukc

Ahh, finally found the "convert to draft" button

dukc avatar Feb 04 '22 20:02 dukc

Mostly done in my local tree. Still trying to get automatic inference of @safe working, it will compile if apply is used in explicit @safe but for some reason if inference is on it stays @system.

I encountered one problem though: consistency. If apply is used on a null Nullable, it will simply do nothing. However if it's a null RefCounted, it will either auto-initialize it or trigger an assertion. Not very consistent. Should I think another name for apply on RefCounted, or alternatively make it no-op on null RefCounted?

dukc avatar Feb 12 '22 11:02 dukc

Should I think another name for apply on RefCounted, or alternatively make it no-op on null RefCounted?

I think renaming it is better in this case, given the difference in how the two versions handle the return value of the callback (wrap it in a Nullable vs return it directly).

A couple ideas:

rc.borrow!((scope ref payload) { /* ... */ });
rc.withPayload!((scope ref payload) { /* ... */ });

pbackus avatar Feb 12 '22 14:02 pbackus

borrow sounds good.

dukc avatar Feb 12 '22 14:02 dukc

Obviously I need to fix the autotest suite at some point, but otherwise ready for more comments now.

dukc avatar Feb 16 '22 20:02 dukc

Oh no. A dmd test that uses dirEntries is trying to link to unsafe destructor of DirIterator, which it can't find because it just became safe. I'm guessing it happens because the DMD test is compiled without DIP 1000, but the phobos binary is compiled with.

If I'm right, making DirIterator a template should solve the problem. Here goes...

dukc avatar Feb 23 '22 22:02 dukc

It looks like it's the test runner itself (run.d) that fails to compile. Maybe an easier fix would be to have the DMD build system compile it with -preivew=dip1000 enabled?

pbackus avatar Feb 23 '22 22:02 pbackus

It looks like it's the test runner itself (run.d) that fails to compile. Maybe an easier fix would be to have the DMD build system compile it with -preivew=dip1000 enabled?

I think I must avoid that separate compilation issue in any case. A Phobos user that happens to not use -dip1000 would face the same issue.

dukc avatar Feb 23 '22 22:02 dukc

Ping @dukc , what is the status of this?

ljmf00 avatar Mar 27 '22 19:03 ljmf00

Ping @dukc , what is the status of this?

Waiting for me to get the test suite green. I'm going to but it may take long because I don't currently have much extra time.

dukc avatar Mar 28 '22 19:03 dukc

Progress update: the thing I'm struggling with is that while I can indeed solve the linking issue with dirEntries itself by making it a template, DirIterator has some generated destructors that don't go away for some reason even if I templatise it and/or add a destructor myself. The symbol it tries to link to changes in the latter case, but that's it. Currently trying to test why it happens.

dukc avatar Apr 06 '22 18:04 dukc

Found the problem. It's this:

struct _DirIteratorTempl()
{
  //dir iterator implementation...
}

alias DirIterator = _DirIteratorTempl!();

auto dirEntries()(string path, SpanMode mode, bool followSymlink = true)
{
  //this works, no linker errors
  return _DirIteratorTempl!()(path, mode, followSymlink);
}

auto dirEntries()(string path, SpanMode mode, bool followSymlink = true)
{
  //this gives linker errors just as if _DirIteratorTempl wasn't a template at all.
  return DirIterator(path, mode, followSymlink);
}

We're lucky that the DirIterator name is undocumented, so it should not be a problem that I break it by making it a template. Otherwise we would be out of luck, because I could not alias it so that it would work without -dip1000 or -i=std.

dukc avatar Apr 06 '22 19:04 dukc

So it seems that BinaryHeap.front is supposed to be @safe, but isn't anymore because it relied on access to RefCounted payload being @safe. Sigh. And CircleCI is reporting a linking error in DRunTime. There should be no way in the world that changes to RefCounted could cause DRunTime to not link but OTOH we already saw it causing a DMD script to not link. Who knows.

dukc avatar Apr 13 '22 20:04 dukc