dlang.org icon indicating copy to clipboard operation
dlang.org copied to clipboard

fix Issue 18016 - using uninitialized value is considered @safe but has undefined behavior

Open WalterBright opened this issue 6 years ago • 35 comments

WalterBright avatar Mar 04 '18 08:03 WalterBright

Thanks for your pull request, @WalterBright!

Bugzilla references

Auto-close Bugzilla Severity Description
18016 normal using uninitialized value is considered @safe but has undefined behavior

dlang-bot avatar Mar 04 '18 08:03 dlang-bot

With this change, compilers would still be allowed to leave the variable uninitialized, right? That means using uninitialized values would be allowed in D (as long as they're not interpreted as pointers). That would be a problem for pure functions. See issue 15542 (pure function with no argument returning different values (with void-initialized static array)).

aG0aep6G avatar Mar 04 '18 12:03 aG0aep6G

Pure functions returning void initialized data is still implementation defined behavior, even in @safe code.

WalterBright avatar Mar 12 '18 03:03 WalterBright

I don't think it's reasonable to allow turning @safe pure functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers. I'm sure there may be even reasoanle applications for using uninitialized memory, but all of them strictly belong to @system code. Pure functions in D are allowed a lot more freedom compared to functions in pure functional languages, but this is all based on the premise that their implementation details don't leak. Otherwise pure may become as meaningless as are const parameters in C++.

PetarKirov avatar Mar 13 '18 09:03 PetarKirov

I don't think it's reasonable to allow turning @safe pure functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers.

It's implementation defined, not undefined. I don't think this is fundamentally different from returning allocated pointers from pure functions. The pointer values will be different each time, even though the function's arguments are identical.

WalterBright avatar May 27 '18 00:05 WalterBright

What's the status on this?

RazvanN7 avatar Jun 13 '18 13:06 RazvanN7

I think this is the wrong way to fix this problem, because it leaves gaping @safe-ty holes. Also see: https://issues.dlang.org/show_bug.cgi?id=19968

Note that issue 19968 is a symptom of a general problem. Allowing uninitialized memory to be read in @safe functions is a bad idea, and shouldn't be allowed by default.

tgehr avatar Jun 15 '19 13:06 tgehr

Note that issue 19968 is a symptom of a general problem.

As I noted in the issue, it is a problem specific to bools.

WalterBright avatar Jun 17 '19 08:06 WalterBright

This is NOT a problem specific to bools!

  • The language specification states that accessing void-initialized data is UB.
  • This can be used to fill private fields of structs that have @trusted functions working on them with garbage.

tgehr avatar Jun 17 '19 17:06 tgehr

The language specification states that accessing void-initialized data is UB.

Indeed, and hence this PR to fix that.

This is NOT a problem specific to bools!

I would appreciate an example of UB from other non-pointer types, as I can't think of one.

There's a reason I push on this. Consider:

char[100] tmpbuffer = void;
... code to use part of the buffer ...

It's a performance issue. I don't want to relegate this to @trusted code only.

WalterBright avatar Jun 18 '19 09:06 WalterBright

The language specification states that accessing void-initialized data is UB.

Indeed, and hence this PR to fix that.

This is NOT a problem specific to bools!

I would appreciate an example of UB from other non-pointer types, as I can't think of one. ...

module data;
int[N] array;
static assert(0<N);

struct VerifiedIndex{
    private int index_=0;
    @disable this(int); // shouldn't be necessary, but it is
    @property int index()@safe{ return index_; }
    @property void index(int i)@trusted{ // no, this can't be @safe
        enforce(0<=x&&x<N, "index out of bounds");
        index_=i;
    }
    int read()@trusted{ return array.ptr[index]; }
    void write(int x)@trusted{ array.ptr[index]=x; }
} 
module app;
import data;

void main()@safe{
    VerifiedIndex i = void;
    i.write(1337);
}

There's a reason I push on this.

I understand, but no memory corruption in @safe code trumps performance in @safe code every time. "I want this to be possible in @safe code" is never more important than "@safe guarantees no memory corruption".

Consider:

char[100] tmpbuffer = void;
... code to use part of the buffer ...

It's a performance issue. I don't want to relegate this to @trusted code only.

  • The spec can define a certain subset of types for which this is @safe in impure functions. E.g., for structs, the constraint can be that void initialization is @safe for all fields, and all fields are public.
  • However, why can't you encapsulate this pattern into a struct with a @trusted interface? You need to do this anyway if you want to use the tmpbuffer for non-POD data.

tgehr avatar Jun 19 '19 00:06 tgehr

Just directed here from a recent forum discussion. Reading Timon's example, it makes it difficult to justify this change.

There are a couple options I can think of to address this. First, and a good possibility, is to allow only POD types to be initialized as void. Anything that has methods, or contains any types that have methods, would be prevented from being void initialized in @safe code, due to the fact that those methods can be made to circumvent safety (via @trusted blocks or methods).

Second option is to somehow allow the type to opt-in to void initializability. Can't think of a good syntax, but this only slightly gives more discretion. Currently, if you void initialize a member, it does nothing, as the whole type is initialized anyway.

However, it may simply be prudent to keep the current rules and fully implement them in the compiler (i.e. void initialization is not allowed in safe code), because Timon is right -- any possibility of corruption with @safe code should be prevented. We can relax the rules later. The only thing that really sucks about this is, it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted. Is there a way around this?

schveiguy avatar Jan 06 '20 14:01 schveiguy

... The only thing that really sucks about this is, it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted. Is there a way around this?

As I stated earlier, you could do this:

struct TmpBuffer(T,size_t n){
    private T[n] buffer=void;
    // ... add @trusted interface here that ensures data is initialized before use
}

Also, given that accessing uninitialized memory is made defined (but non-deterministic) behavior, void-initialization can be kept @safe for all types for which @safe code can put any bit pattern into the memory occupied by the type without void-initialization. This does not include bool and it does not include structs that maintain some internal invariant.

tgehr avatar Jan 06 '20 15:01 tgehr

you could do this

I don't know if that works, though. When I declare an instance of TmpBuffer, it initializes it to 0s.

schveiguy avatar Jan 06 '20 16:01 schveiguy

you could do this

I don't know if that works, though. When I declare an instance of TmpBuffer, it initializes it to 0s.

https://issues.dlang.org/show_bug.cgi?id=11331

tgehr avatar Jan 06 '20 16:01 tgehr

Right, but that just means you can't use your proposed workaround/solution. Again, is there a way around it?

schveiguy avatar Jan 06 '20 16:01 schveiguy

@schveiguy:

it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted

Then wrap the rest of the function in a safe function to enable checking again:

@trusted foo() {
T[n] buffer = void;
() @safe {
// Manipulate buffer
...
}();
}

ntrel avatar Jun 08 '20 11:06 ntrel

@ntrel a possibility, but not practical for templates where you want everything inferred.

schveiguy avatar Jun 08 '20 14:06 schveiguy

@schveiguy sounds like a job for your default(@safe) attribute then ;-)

ntrel avatar Jun 08 '20 15:06 ntrel

I'm not sure that solves the problem. For example, if you want foo to be inferred @system for any @system usage inside the lambda, but want to trust just the void initialization, marking the lambda default(@safe) does not alter the @trusted marking on foo.

schveiguy avatar Jun 08 '20 15:06 schveiguy

@schveiguy Yes, you're right that doesn't solve it. I think it could be done something like this:

// user code
void parent(...) {
    void work(alias safeVoidBuffer) {...}
    mixin UseSafeVoidBuffer!(T, n, work);
}

// library
module safevoidbuf;

struct SafeVoidBuffer(T, size_t n) {...}

template UseSafeVoidBuffer(T, size_t n, alias worker) {
    () @trusted {
        SafeVoidBuffer!(T, n) buf = void;
        buf.initSafely();
        worker!buf();
    }();
    static if (isSystem!(worker!(...))) // assuming we can implement isSystem
        if (0) doSomethingSystem; // force system inference for parent function
}

So it might be doable. But supporting void initialization for struct fields seems like the best fix, avoiding the need for a nested function and mixin.

ntrel avatar Jun 08 '20 17:06 ntrel

But supporting void initialization for struct fields seems like the best fix, avoiding the need for a nested function and mixin.

Yes.

tgehr avatar Jun 08 '20 18:06 tgehr

I don't understand how we can honestly make an argument that void initialized variables are @safe. We put ourselves on a pedestal because we initialize variables by default (whereas C and C++ have a lot of safety issues for not doing it), but when it comes to void initialization (which is literally a mean to have uninitialized variables) we consider it @safe. This is a glaring contradiction and in my opinion the fix should be to disallow void initialization in @safe code.

RazvanN7 avatar Jun 16 '21 11:06 RazvanN7

I would appreciate an example of UB from other non-pointer types, as I can't think of one.

@edi33416 and I have master student that tries to find holes in D's safety guarantees. He has been able to obtain random code execution from overlapping the stack with the heap by allocating a large void initialized static array on the stack. Something similar to: https://issues.dlang.org/show_bug.cgi?id=17566 .

@WalterBright please reconsider this. At the very least, it should be illegal to have void initialized stack allocated static arrays.

RazvanN7 avatar Jun 16 '21 11:06 RazvanN7

I don't understand how we can honestly make an argument that void initialized variables are @safe.

A void-initialized variable can be @safe if:

  1. Reading from it is guaranteed by the compiler and the language spec to result in an unspecified value rather than undefined behavior.
  2. Every possible value of that variable's type is a safe value.

pbackus avatar Aug 12 '21 21:08 pbackus

I don't think it's reasonable to allow turning @safe pure functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers. I'm sure there may be even reasoanle applications for using uninitialized memory, but all of them strictly belong to @system code. Pure functions in D are allowed a lot more freedom compared to functions in pure functional languages, but this is all based on the premise that their implementation details don't leak. Otherwise pure may become as meaningless as are const parameters in C++.

I agree. Pure in D is becoming meaningless with this type of behaviours.

void main() @safe
{
    auto voidInitialize(size_t Tsize)() @safe pure
    {
        ubyte[Tsize] _ = void;
        void[Tsize] ret = _[];
        return ret;
    }
    
    import std.stdio : writeln;
    voidInitialize!100.writeln;
    voidInitialize!100.writeln;
    voidInitialize!100.writeln;
}

This should simply not be an accepted code.

Weak purity rules can be checked if any reference is passed to the function, but void initializers in a pure function allow this code to be considered pure, which cannot be checked at all from an outside scope, as you do by reading the function signature and check if any reference is being passed and possibly changed.

I don't understand how we can honestly make an argument that void initialized variables are @safe. We put ourselves on a pedestal because we initialize variables by default (whereas C and C++ have a lot of safety issues for not doing it), but when it comes to void initialization (which is literally a mean to have uninitialized variables) we consider it @safe. This is a glaring contradiction and in my opinion the fix should be to disallow void initialization in @safe code.

You miss understanding the @safe behaviour. Void initializers if not stated as undefined behaviour and doesn't initialize a type with indirections produces valid safe values. Inconsistent program state is different from @safe.


My opinion on this is that pure functions shouldn't allow returning non-setted void initialized memory although, it is perfectly @safe, according to the @safe rules of D as long as the type is not a reference type or has indirections.

ljmf00 avatar Aug 17 '21 16:08 ljmf00

I would appreciate an example of UB from other non-pointer types, as I can't think of one.

@edi33416 and I have master student that tries to find holes in D's safety guarantees. He has been able to obtain random code execution from overlapping the stack with the heap by allocating a large void initialized static array on the stack. Something similar to: issues.dlang.org/show_bug.cgi?id=17566 .

@WalterBright please reconsider this. At the very least, it should be illegal to have void initialized stack allocated static arrays.

I would still consider implementation defined or defined behaviour as this can be checked at runtime. Take the example of GCC flag -fstack-check and -fstack-clash-protection from LLVM. This prevents that stack clashing from occurring. And this code segfaults anyway with the default initializer. This code uses unsafe calls previously that may be considered undefined behaviour too because those calls don't provide a safe interface.

ljmf00 avatar Aug 17 '21 18:08 ljmf00

This code uses unsafe calls previously that may be considered undefined behaviour too because those calls don't provide a safe interface.

Not having a safe interface doesn't mean every call can be considered undefined behavior. Undefined behavior only comes up when the function is called incorrectly.

I'm not sure what code you're referring to. If it's issue 17566, then the only relevant call is the one to mmap. As far as I can tell, it's called correctly. If you think there's a problem with how mmap is called, please elaborate.

aG0aep6G avatar Aug 17 '21 23:08 aG0aep6G

Not having a safe interface doesn't mean every call can be considered undefined behavior.

True, but there's a possibility. ("that may be considered undefined behaviour")

Undefined behavior only comes up when the function is called incorrectly.

No. If the function internally does something against D rules it may cause undefined behaviour like dereferencing a pointer that is pointing to an invalid object. See 12.1.1.5. point in the specification.

I'm not sure what code you're referring to. If it's issue 17566, then the only relevant call is the one to mmap. As far as I can tell, it's called correctly. If you think there's a problem with how mmap is called, please elaborate.

In the mmap call you are passing a reference that is invalid. You are manipulating a reference.

void* dst = stackBottom - sz;

mmap may internally dereference it and cast it to another type. Casting is unsafe (See 19.24.1.3. point in the specification) and dereferencing it is undefined behaviour as that reference is manipulated to be invalid. You can say that this is valid within the stack bounds but it is not guaranteed to be a semantically behaviour defined D program according to the specification.

ljmf00 avatar Aug 18 '21 00:08 ljmf00

Not having a safe interface doesn't mean every call can be considered undefined behavior.

True, but there's a possibility. ("that may be considered undefined behaviour")

No, there isn't a different kind of "possibility" of UB for unsafe interfaces. When used incorrectly, both safe and unsafe interfaces can exhibit UB. When used correctly, they both still can, but only due to bugs.

The difference between safe and unsafe interfaces is how a correct call is defined. For safe interfaces, it's a bunch of language rules. For unsafe interfaces, it's an agreement between the author of the function and the user.

Undefined behavior only comes up when the function is called incorrectly.

No. If the function internally does something against D rules it may cause undefined behaviour like dereferencing a pointer that is pointing to an invalid object. See 12.1.1.5. point in the specification.

If an instance of UB is not due to an incorrect call, then it's a bug. Bugs can cause UB in both safe and unsafe interfaces, so they're not relevant here.

In the mmap call you are passing a reference that is invalid. You are manipulating a reference.

void* dst = stackBottom - sz;

That's an unsafe operation. Doesn't mean it has undefined behavior. Passing a garbage pointer to an @system function is fine if that function doesn't dereference it (and doesn't expose it to @safe code, yada, yada, yada). It does not have more of "a possibility" of UB than calling an @safe function (which also has "a possibility" of dereferencing an invalid pointer due to a bug).

mmap may internally dereference it and cast it to another type.

It doesn't. You can read here what mmap does: https://man7.org/linux/man-pages/man2/mmap.2.html

I maintain: mmap is being used correctly. The call does not exhibit "a possibility" of undefined behavior that is worth mentioning in this context. It's the same "possibility" as when calling an @safe function.

aG0aep6G avatar Aug 18 '21 01:08 aG0aep6G