dlang.org fix Issue 18016 - using uninitialized value is considered @safe but has undefined behavior

fix Issue 18016 - using uninitialized value is considered @safe but has undefined behavior

Open WalterBright opened this issue 6 years ago • 35 comments

Mar 04 '18 08:03 WalterBright

Thanks for your pull request, @WalterBright!

Bugzilla references

Auto-close	Bugzilla	Severity	Description
✓	18016	normal	using uninitialized value is considered @safe but has undefined behavior

Mar 04 '18 08:03 dlang-bot

With this change, compilers would still be allowed to leave the variable uninitialized, right? That means using uninitialized values would be allowed in D (as long as they're not interpreted as pointers). That would be a problem for pure functions. See issue 15542 (pure function with no argument returning different values (with void-initialized static array)).

Mar 04 '18 12:03 aG0aep6G

Pure functions returning void initialized data is still implementation defined behavior, even in @safe code.

Mar 12 '18 03:03 WalterBright

I don't think it's reasonable to allow turning @safe pure functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers. I'm sure there may be even reasoanle applications for using uninitialized memory, but all of them strictly belong to @system code. Pure functions in D are allowed a lot more freedom compared to functions in pure functional languages, but this is all based on the premise that their implementation details don't leak. Otherwise pure may become as meaningless as are const parameters in C++.

Mar 13 '18 09:03 PetarKirov

I don't think it's reasonable to allow turning @safe pure functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers.

It's implementation defined, not undefined. I don't think this is fundamentally different from returning allocated pointers from pure functions. The pointer values will be different each time, even though the function's arguments are identical.

May 27 '18 00:05 WalterBright

What's the status on this?

Jun 13 '18 13:06 RazvanN7

I think this is the wrong way to fix this problem, because it leaves gaping @safe-ty holes. Also see: https://issues.dlang.org/show_bug.cgi?id=19968

Note that issue 19968 is a symptom of a general problem. Allowing uninitialized memory to be read in @safe functions is a bad idea, and shouldn't be allowed by default.

Jun 15 '19 13:06 tgehr

Note that issue 19968 is a symptom of a general problem.

As I noted in the issue, it is a problem specific to bools.

Jun 17 '19 08:06 WalterBright

This is NOT a problem specific to bools!

The language specification states that accessing void-initialized data is UB.
This can be used to fill private fields of structs that have @trusted functions working on them with garbage.

Jun 17 '19 17:06 tgehr

The language specification states that accessing void-initialized data is UB.

Indeed, and hence this PR to fix that.

This is NOT a problem specific to bools!

I would appreciate an example of UB from other non-pointer types, as I can't think of one.

There's a reason I push on this. Consider:

char[100] tmpbuffer = void;
... code to use part of the buffer ...

It's a performance issue. I don't want to relegate this to @trusted code only.

Jun 18 '19 09:06 WalterBright

The language specification states that accessing void-initialized data is UB.

Indeed, and hence this PR to fix that.

This is NOT a problem specific to bools!

I would appreciate an example of UB from other non-pointer types, as I can't think of one. ...

module data;
int[N] array;
static assert(0<N);

struct VerifiedIndex{
    private int index_=0;
    @disable this(int); // shouldn't be necessary, but it is
    @property int index()@safe{ return index_; }
    @property void index(int i)@trusted{ // no, this can't be @safe
        enforce(0<=x&&x<N, "index out of bounds");
        index_=i;
    }
    int read()@trusted{ return array.ptr[index]; }
    void write(int x)@trusted{ array.ptr[index]=x; }
}

module app;
import data;

void main()@safe{
    VerifiedIndex i = void;
    i.write(1337);
}

There's a reason I push on this.

I understand, but no memory corruption in @safe code trumps performance in @safe code every time. "I want this to be possible in @safe code" is never more important than "@safe guarantees no memory corruption".

Consider:
char[100] tmpbuffer = void;
... code to use part of the buffer ...
It's a performance issue. I don't want to relegate this to @trusted code only.

The spec can define a certain subset of types for which this is @safe in impure functions. E.g., for structs, the constraint can be that void initialization is @safe for all fields, and all fields are public.
However, why can't you encapsulate this pattern into a struct with a @trusted interface? You need to do this anyway if you want to use the tmpbuffer for non-POD data.

Jun 19 '19 00:06 tgehr

Just directed here from a recent forum discussion. Reading Timon's example, it makes it difficult to justify this change.

There are a couple options I can think of to address this. First, and a good possibility, is to allow only POD types to be initialized as void. Anything that has methods, or contains any types that have methods, would be prevented from being void initialized in @safe code, due to the fact that those methods can be made to circumvent safety (via @trusted blocks or methods).

Second option is to somehow allow the type to opt-in to void initializability. Can't think of a good syntax, but this only slightly gives more discretion. Currently, if you void initialize a member, it does nothing, as the whole type is initialized anyway.

However, it may simply be prudent to keep the current rules and fully implement them in the compiler (i.e. void initialization is not allowed in safe code), because Timon is right -- any possibility of corruption with @safe code should be prevented. We can relax the rules later. The only thing that really sucks about this is, it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted. Is there a way around this?

Jan 06 '20 14:01 schveiguy

... The only thing that really sucks about this is, it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted. Is there a way around this?

As I stated earlier, you could do this:

struct TmpBuffer(T,size_t n){
    private T[n] buffer=void;
    // ... add @trusted interface here that ensures data is initialized before use
}

Also, given that accessing uninitialized memory is made defined (but non-deterministic) behavior, void-initialization can be kept @safe for all types for which @safe code can put any bit pattern into the memory occupied by the type without void-initialization. This does not include bool and it does not include structs that maintain some internal invariant.

Jan 06 '20 15:01 tgehr

you could do this

I don't know if that works, though. When I declare an instance of TmpBuffer, it initializes it to 0s.

Jan 06 '20 16:01 schveiguy

you could do this

I don't know if that works, though. When I declare an instance of TmpBuffer, it initializes it to 0s.

https://issues.dlang.org/show_bug.cgi?id=11331

Jan 06 '20 16:01 tgehr

Right, but that just means you can't use your proposed workaround/solution. Again, is there a way around it?

Jan 06 '20 16:01 schveiguy

@schveiguy:

it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted

Then wrap the rest of the function in a safe function to enable checking again:

@trusted foo() {
T[n] buffer = void;
() @safe {
// Manipulate buffer
...
}();
}

Jun 08 '20 11:06 ntrel

@ntrel a possibility, but not practical for templates where you want everything inferred.

Jun 08 '20 14:06 schveiguy

@schveiguy sounds like a job for your default(@safe) attribute then ;-)

Jun 08 '20 15:06 ntrel

I'm not sure that solves the problem. For example, if you want foo to be inferred @system for any @system usage inside the lambda, but want to trust just the void initialization, marking the lambda default(@safe) does not alter the @trusted marking on foo.

Jun 08 '20 15:06 schveiguy

@schveiguy Yes, you're right that doesn't solve it. I think it could be done something like this:

// user code
void parent(...) {
    void work(alias safeVoidBuffer) {...}
    mixin UseSafeVoidBuffer!(T, n, work);
}

// library
module safevoidbuf;

struct SafeVoidBuffer(T, size_t n) {...}

template UseSafeVoidBuffer(T, size_t n, alias worker) {
    () @trusted {
        SafeVoidBuffer!(T, n) buf = void;
        buf.initSafely();
        worker!buf();
    }();
    static if (isSystem!(worker!(...))) // assuming we can implement isSystem
        if (0) doSomethingSystem; // force system inference for parent function
}

So it might be doable. But supporting void initialization for struct fields seems like the best fix, avoiding the need for a nested function and mixin.

Jun 08 '20 17:06 ntrel

But supporting void initialization for struct fields seems like the best fix, avoiding the need for a nested function and mixin.

Yes.

Jun 08 '20 18:06 tgehr

I don't understand how we can honestly make an argument that void initialized variables are @safe. We put ourselves on a pedestal because we initialize variables by default (whereas C and C++ have a lot of safety issues for not doing it), but when it comes to void initialization (which is literally a mean to have uninitialized variables) we consider it @safe. This is a glaring contradiction and in my opinion the fix should be to disallow void initialization in @safe code.

Jun 16 '21 11:06 RazvanN7

I would appreciate an example of UB from other non-pointer types, as I can't think of one.

@edi33416 and I have master student that tries to find holes in D's safety guarantees. He has been able to obtain random code execution from overlapping the stack with the heap by allocating a large void initialized static array on the stack. Something similar to: https://issues.dlang.org/show_bug.cgi?id=17566 .

@WalterBright please reconsider this. At the very least, it should be illegal to have void initialized stack allocated static arrays.

Jun 16 '21 11:06 RazvanN7

I don't understand how we can honestly make an argument that void initialized variables are @safe.

A void-initialized variable can be @safe if:

Reading from it is guaranteed by the compiler and the language spec to result in an unspecified value rather than undefined behavior.
Every possible value of that variable's type is a safe value.

Aug 12 '21 21:08 pbackus

I don't think it's reasonable to allow turning @safe pure functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers. I'm sure there may be even reasoanle applications for using uninitialized memory, but all of them strictly belong to @system code. Pure functions in D are allowed a lot more freedom compared to functions in pure functional languages, but this is all based on the premise that their implementation details don't leak. Otherwise pure may become as meaningless as are const parameters in C++.

I agree. Pure in D is becoming meaningless with this type of behaviours.

void main() @safe
{
    auto voidInitialize(size_t Tsize)() @safe pure
    {
        ubyte[Tsize] _ = void;
        void[Tsize] ret = _[];
        return ret;
    }
    
    import std.stdio : writeln;
    voidInitialize!100.writeln;
    voidInitialize!100.writeln;
    voidInitialize!100.writeln;
}

This should simply not be an accepted code.

Weak purity rules can be checked if any reference is passed to the function, but void initializers in a pure function allow this code to be considered pure, which cannot be checked at all from an outside scope, as you do by reading the function signature and check if any reference is being passed and possibly changed.

I don't understand how we can honestly make an argument that void initialized variables are @safe. We put ourselves on a pedestal because we initialize variables by default (whereas C and C++ have a lot of safety issues for not doing it), but when it comes to void initialization (which is literally a mean to have uninitialized variables) we consider it @safe. This is a glaring contradiction and in my opinion the fix should be to disallow void initialization in @safe code.

You miss understanding the @safe behaviour. Void initializers if not stated as undefined behaviour and doesn't initialize a type with indirections produces valid safe values. Inconsistent program state is different from @safe.

My opinion on this is that pure functions shouldn't allow returning non-setted void initialized memory although, it is perfectly @safe, according to the @safe rules of D as long as the type is not a reference type or has indirections.

Aug 17 '21 16:08 ljmf00

I would appreciate an example of UB from other non-pointer types, as I can't think of one.

@edi33416 and I have master student that tries to find holes in D's safety guarantees. He has been able to obtain random code execution from overlapping the stack with the heap by allocating a large void initialized static array on the stack. Something similar to: issues.dlang.org/show_bug.cgi?id=17566 .

@WalterBright please reconsider this. At the very least, it should be illegal to have void initialized stack allocated static arrays.

I would still consider implementation defined or defined behaviour as this can be checked at runtime. Take the example of GCC flag -fstack-check and -fstack-clash-protection from LLVM. This prevents that stack clashing from occurring. And this code segfaults anyway with the default initializer. This code uses unsafe calls previously that may be considered undefined behaviour too because those calls don't provide a safe interface.

Aug 17 '21 18:08 ljmf00

This code uses unsafe calls previously that may be considered undefined behaviour too because those calls don't provide a safe interface.

Not having a safe interface doesn't mean every call can be considered undefined behavior. Undefined behavior only comes up when the function is called incorrectly.

I'm not sure what code you're referring to. If it's issue 17566, then the only relevant call is the one to mmap. As far as I can tell, it's called correctly. If you think there's a problem with how mmap is called, please elaborate.

Aug 17 '21 23:08 aG0aep6G

Not having a safe interface doesn't mean every call can be considered undefined behavior.

True, but there's a possibility. ("that may be considered undefined behaviour")

Undefined behavior only comes up when the function is called incorrectly.

No. If the function internally does something against D rules it may cause undefined behaviour like dereferencing a pointer that is pointing to an invalid object. See 12.1.1.5. point in the specification.

I'm not sure what code you're referring to. If it's issue 17566, then the only relevant call is the one to mmap. As far as I can tell, it's called correctly. If you think there's a problem with how mmap is called, please elaborate.

In the mmap call you are passing a reference that is invalid. You are manipulating a reference.

void* dst = stackBottom - sz;

mmap may internally dereference it and cast it to another type. Casting is unsafe (See 19.24.1.3. point in the specification) and dereferencing it is undefined behaviour as that reference is manipulated to be invalid. You can say that this is valid within the stack bounds but it is not guaranteed to be a semantically behaviour defined D program according to the specification.

Aug 18 '21 00:08 ljmf00

Not having a safe interface doesn't mean every call can be considered undefined behavior.

True, but there's a possibility. ("that may be considered undefined behaviour")

No, there isn't a different kind of "possibility" of UB for unsafe interfaces. When used incorrectly, both safe and unsafe interfaces can exhibit UB. When used correctly, they both still can, but only due to bugs.

The difference between safe and unsafe interfaces is how a correct call is defined. For safe interfaces, it's a bunch of language rules. For unsafe interfaces, it's an agreement between the author of the function and the user.

Undefined behavior only comes up when the function is called incorrectly.

No. If the function internally does something against D rules it may cause undefined behaviour like dereferencing a pointer that is pointing to an invalid object. See 12.1.1.5. point in the specification.

If an instance of UB is not due to an incorrect call, then it's a bug. Bugs can cause UB in both safe and unsafe interfaces, so they're not relevant here.

In the mmap call you are passing a reference that is invalid. You are manipulating a reference.
void* dst = stackBottom - sz;

That's an unsafe operation. Doesn't mean it has undefined behavior. Passing a garbage pointer to an @system function is fine if that function doesn't dereference it (and doesn't expose it to @safe code, yada, yada, yada). It does not have more of "a possibility" of UB than calling an @safe function (which also has "a possibility" of dereferencing an invalid pointer due to a bug).

mmap may internally dereference it and cast it to another type.

It doesn't. You can read here what mmap does: https://man7.org/linux/man-pages/man2/mmap.2.html

I maintain: mmap is being used correctly. The call does not exhibit "a possibility" of undefined behavior that is worth mentioning in this context. It's the same "possibility" as when calling an @safe function.

Aug 18 '21 01:08 aG0aep6G

dlang.org dlang.org copied to clipboard

fix Issue 18016 - using uninitialized value is considered @safe but has undefined behavior

Bugzilla references

dlang.org
dlang.org copied to clipboard