dlang.org
dlang.org copied to clipboard
fix Issue 18016 - using uninitialized value is considered @safe but has undefined behavior
Thanks for your pull request, @WalterBright!
Bugzilla references
Auto-close | Bugzilla | Severity | Description |
---|---|---|---|
✓ | 18016 | normal | using uninitialized value is considered @safe but has undefined behavior |
With this change, compilers would still be allowed to leave the variable uninitialized, right? That means using uninitialized values would be allowed in D (as long as they're not interpreted as pointers). That would be a problem for pure
functions. See issue 15542 (pure function with no argument returning different values (with void-initialized static array)).
Pure functions returning void initialized data is still implementation defined behavior, even in @safe code.
I don't think it's reasonable to allow turning @safe pure
functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers. I'm sure there may be even reasoanle applications for using uninitialized memory, but all of them strictly belong to @system
code. Pure functions in D are allowed a lot more freedom compared to functions in pure functional languages, but this is all based on the premise that their implementation details don't leak. Otherwise pure
may become as meaningless as are const
parameters in C++.
I don't think it's reasonable to allow turning @safe pure functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers.
It's implementation defined, not undefined. I don't think this is fundamentally different from returning allocated pointers from pure functions. The pointer values will be different each time, even though the function's arguments are identical.
What's the status on this?
I think this is the wrong way to fix this problem, because it leaves gaping @safe-ty holes. Also see: https://issues.dlang.org/show_bug.cgi?id=19968
Note that issue 19968 is a symptom of a general problem. Allowing uninitialized memory to be read in @safe functions is a bad idea, and shouldn't be allowed by default.
Note that issue 19968 is a symptom of a general problem.
As I noted in the issue, it is a problem specific to bools.
This is NOT a problem specific to bools!
- The language specification states that accessing
void
-initialized data is UB. - This can be used to fill private fields of structs that have
@trusted
functions working on them with garbage.
The language specification states that accessing void-initialized data is UB.
Indeed, and hence this PR to fix that.
This is NOT a problem specific to bools!
I would appreciate an example of UB from other non-pointer types, as I can't think of one.
There's a reason I push on this. Consider:
char[100] tmpbuffer = void;
... code to use part of the buffer ...
It's a performance issue. I don't want to relegate this to @trusted code only.
The language specification states that accessing void-initialized data is UB.
Indeed, and hence this PR to fix that.
This is NOT a problem specific to bools!
I would appreciate an example of UB from other non-pointer types, as I can't think of one. ...
module data;
int[N] array;
static assert(0<N);
struct VerifiedIndex{
private int index_=0;
@disable this(int); // shouldn't be necessary, but it is
@property int index()@safe{ return index_; }
@property void index(int i)@trusted{ // no, this can't be @safe
enforce(0<=x&&x<N, "index out of bounds");
index_=i;
}
int read()@trusted{ return array.ptr[index]; }
void write(int x)@trusted{ array.ptr[index]=x; }
}
module app;
import data;
void main()@safe{
VerifiedIndex i = void;
i.write(1337);
}
There's a reason I push on this.
I understand, but no memory corruption in @safe
code trumps performance in @safe
code every time. "I want this to be possible in @safe
code" is never more important than "@safe
guarantees no memory corruption".
Consider:
char[100] tmpbuffer = void; ... code to use part of the buffer ...
It's a performance issue. I don't want to relegate this to @trusted code only.
- The spec can define a certain subset of types for which this is
@safe
in impure functions. E.g., for structs, the constraint can be thatvoid
initialization is@safe
for all fields, and all fields arepublic
. - However, why can't you encapsulate this pattern into a struct with a
@trusted
interface? You need to do this anyway if you want to use thetmpbuffer
for non-POD data.
Just directed here from a recent forum discussion. Reading Timon's example, it makes it difficult to justify this change.
There are a couple options I can think of to address this. First, and a good possibility, is to allow only POD types to be initialized as void. Anything that has methods, or contains any types that have methods, would be prevented from being void initialized in @safe
code, due to the fact that those methods can be made to circumvent safety (via @trusted
blocks or methods).
Second option is to somehow allow the type to opt-in to void initializability. Can't think of a good syntax, but this only slightly gives more discretion. Currently, if you void initialize a member, it does nothing, as the whole type is initialized anyway.
However, it may simply be prudent to keep the current rules and fully implement them in the compiler (i.e. void initialization is not allowed in safe code), because Timon is right -- any possibility of corruption with @safe
code should be prevented. We can relax the rules later. The only thing that really sucks about this is, it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted. Is there a way around this?
... The only thing that really sucks about this is, it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted. Is there a way around this?
As I stated earlier, you could do this:
struct TmpBuffer(T,size_t n){
private T[n] buffer=void;
// ... add @trusted interface here that ensures data is initialized before use
}
Also, given that accessing uninitialized memory is made defined (but non-deterministic) behavior, void
-initialization can be kept @safe
for all types for which @safe
code can put any bit pattern into the memory occupied by the type without void
-initialization. This does not include bool
and it does not include struct
s that maintain some internal invariant.
you could do this
I don't know if that works, though. When I declare an instance of TmpBuffer
, it initializes it to 0s.
you could do this
I don't know if that works, though. When I declare an instance of
TmpBuffer
, it initializes it to 0s.
https://issues.dlang.org/show_bug.cgi?id=11331
Right, but that just means you can't use your proposed workaround/solution. Again, is there a way around it?
@schveiguy:
it's not possible to create a buffer inside a trusted escape -- the whole function must be trusted
Then wrap the rest of the function in a safe function to enable checking again:
@trusted foo() {
T[n] buffer = void;
() @safe {
// Manipulate buffer
...
}();
}
@ntrel a possibility, but not practical for templates where you want everything inferred.
@schveiguy sounds like a job for your default(@safe)
attribute then ;-)
I'm not sure that solves the problem. For example, if you want foo
to be inferred @system
for any @system
usage inside the lambda, but want to trust just the void initialization, marking the lambda default(@safe)
does not alter the @trusted
marking on foo
.
@schveiguy Yes, you're right that doesn't solve it. I think it could be done something like this:
// user code
void parent(...) {
void work(alias safeVoidBuffer) {...}
mixin UseSafeVoidBuffer!(T, n, work);
}
// library
module safevoidbuf;
struct SafeVoidBuffer(T, size_t n) {...}
template UseSafeVoidBuffer(T, size_t n, alias worker) {
() @trusted {
SafeVoidBuffer!(T, n) buf = void;
buf.initSafely();
worker!buf();
}();
static if (isSystem!(worker!(...))) // assuming we can implement isSystem
if (0) doSomethingSystem; // force system inference for parent function
}
So it might be doable. But supporting void initialization for struct fields seems like the best fix, avoiding the need for a nested function and mixin.
But supporting void initialization for struct fields seems like the best fix, avoiding the need for a nested function and mixin.
Yes.
I don't understand how we can honestly make an argument that void
initialized variables are @safe. We put ourselves on a pedestal because we initialize variables by default (whereas C and C++ have a lot of safety issues for not doing it), but when it comes to void initialization (which is literally a mean to have uninitialized variables) we consider it @safe. This is a glaring contradiction and in my opinion the fix should be to disallow void initialization in @safe code.
I would appreciate an example of UB from other non-pointer types, as I can't think of one.
@edi33416 and I have master student that tries to find holes in D's safety guarantees. He has been able to obtain random code execution from overlapping the stack with the heap by allocating a large void initialized static array on the stack. Something similar to: https://issues.dlang.org/show_bug.cgi?id=17566 .
@WalterBright please reconsider this. At the very least, it should be illegal to have void initialized stack allocated static arrays.
I don't understand how we can honestly make an argument that
void
initialized variables are@safe
.
A void
-initialized variable can be @safe
if:
- Reading from it is guaranteed by the compiler and the language spec to result in an unspecified value rather than undefined behavior.
- Every possible value of that variable's type is a safe value.
I don't think it's reasonable to allow turning
@safe pure
functions into essentially bad PRNGs and make potential purity based optimizations into undefined behavior magnifiers. I'm sure there may be even reasoanle applications for using uninitialized memory, but all of them strictly belong to@system
code. Pure functions in D are allowed a lot more freedom compared to functions in pure functional languages, but this is all based on the premise that their implementation details don't leak. Otherwisepure
may become as meaningless as areconst
parameters in C++.
I agree. Pure in D is becoming meaningless with this type of behaviours.
void main() @safe
{
auto voidInitialize(size_t Tsize)() @safe pure
{
ubyte[Tsize] _ = void;
void[Tsize] ret = _[];
return ret;
}
import std.stdio : writeln;
voidInitialize!100.writeln;
voidInitialize!100.writeln;
voidInitialize!100.writeln;
}
This should simply not be an accepted code.
Weak purity rules can be checked if any reference is passed to the function, but void initializers in a pure
function allow this code to be considered pure
, which cannot be checked at all from an outside scope, as you do by reading the function signature and check if any reference is being passed and possibly changed.
I don't understand how we can honestly make an argument that
void
initialized variables are @safe. We put ourselves on a pedestal because we initialize variables by default (whereas C and C++ have a lot of safety issues for not doing it), but when it comes to void initialization (which is literally a mean to have uninitialized variables) we consider it @safe. This is a glaring contradiction and in my opinion the fix should be to disallow void initialization in @safe code.
You miss understanding the @safe
behaviour. Void initializers if not stated as undefined behaviour and doesn't initialize a type with indirections produces valid safe
values. Inconsistent program state is different from @safe
.
My opinion on this is that pure
functions shouldn't allow returning non-setted void initialized memory although, it is perfectly @safe
, according to the @safe
rules of D as long as the type is not a reference type or has indirections.
I would appreciate an example of UB from other non-pointer types, as I can't think of one.
@edi33416 and I have master student that tries to find holes in D's safety guarantees. He has been able to obtain random code execution from overlapping the stack with the heap by allocating a large void initialized static array on the stack. Something similar to: issues.dlang.org/show_bug.cgi?id=17566 .
@WalterBright please reconsider this. At the very least, it should be illegal to have void initialized stack allocated static arrays.
I would still consider implementation defined or defined behaviour as this can be checked at runtime. Take the example of GCC flag -fstack-check
and -fstack-clash-protection
from LLVM. This prevents that stack clashing from occurring. And this code segfaults anyway with the default initializer. This code uses unsafe calls previously that may be considered undefined behaviour too because those calls don't provide a safe interface.
This code uses unsafe calls previously that may be considered undefined behaviour too because those calls don't provide a safe interface.
Not having a safe interface doesn't mean every call can be considered undefined behavior. Undefined behavior only comes up when the function is called incorrectly.
I'm not sure what code you're referring to. If it's issue 17566, then the only relevant call is the one to mmap
. As far as I can tell, it's called correctly. If you think there's a problem with how mmap
is called, please elaborate.
Not having a safe interface doesn't mean every call can be considered undefined behavior.
True, but there's a possibility. ("that may be considered undefined behaviour")
Undefined behavior only comes up when the function is called incorrectly.
No. If the function internally does something against D rules it may cause undefined behaviour like dereferencing a pointer that is pointing to an invalid object. See 12.1.1.5. point in the specification.
I'm not sure what code you're referring to. If it's issue 17566, then the only relevant call is the one to
mmap
. As far as I can tell, it's called correctly. If you think there's a problem with howmmap
is called, please elaborate.
In the mmap
call you are passing a reference that is invalid. You are manipulating a reference.
void* dst = stackBottom - sz;
mmap
may internally dereference it and cast it to another type. Casting is unsafe (See 19.24.1.3. point in the specification) and dereferencing it is undefined behaviour as that reference is manipulated to be invalid. You can say that this is valid within the stack bounds but it is not guaranteed to be a semantically behaviour defined D program according to the specification.
Not having a safe interface doesn't mean every call can be considered undefined behavior.
True, but there's a possibility. ("that may be considered undefined behaviour")
No, there isn't a different kind of "possibility" of UB for unsafe interfaces. When used incorrectly, both safe and unsafe interfaces can exhibit UB. When used correctly, they both still can, but only due to bugs.
The difference between safe and unsafe interfaces is how a correct call is defined. For safe interfaces, it's a bunch of language rules. For unsafe interfaces, it's an agreement between the author of the function and the user.
Undefined behavior only comes up when the function is called incorrectly.
No. If the function internally does something against D rules it may cause undefined behaviour like dereferencing a pointer that is pointing to an invalid object. See 12.1.1.5. point in the specification.
If an instance of UB is not due to an incorrect call, then it's a bug. Bugs can cause UB in both safe and unsafe interfaces, so they're not relevant here.
In the
mmap
call you are passing a reference that is invalid. You are manipulating a reference.void* dst = stackBottom - sz;
That's an unsafe operation. Doesn't mean it has undefined behavior. Passing a garbage pointer to an @system
function is fine if that function doesn't dereference it (and doesn't expose it to @safe
code, yada, yada, yada). It does not have more of "a possibility" of UB than calling an @safe
function (which also has "a possibility" of dereferencing an invalid pointer due to a bug).
mmap
may internally dereference it and cast it to another type.
It doesn't. You can read here what mmap
does: https://man7.org/linux/man-pages/man2/mmap.2.html
I maintain: mmap
is being used correctly. The call does not exhibit "a possibility" of undefined behavior that is worth mentioning in this context. It's the same "possibility" as when calling an @safe
function.