dmd
dmd copied to clipboard
Fix 20148 - void initializated bool can be both true and false
Still need to prevent array cast and union overlap though, I'll open a follow up issue if this gets merged
Thanks for your pull request and interest in making D better, @dkorpel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:
- My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
- My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
- I have provided a detailed rationale explaining my changes
- New or modified functions have Ddoc comments (with
Params:andReturns:)
Please see CONTRIBUTING.md for more information.
If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.
Bugzilla references
Your PR doesn't reference any Bugzilla issue.
If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
⚠️⚠️⚠️ Warnings ⚠️⚠️⚠️
- In preparation for migrating from Bugzilla to GitHub Issues, the issue reference syntax has changed. Please add the word "Bugzilla" to issue references. For example,
Fix Bugzilla Issue 12345orFix Bugzilla 12345.(Reminder: the edit needs to be done in the Git commit message, not the GitHub pull request.)
Testing this PR locally
If you don't have a local development environment setup, you can use Digger to test this PR:
dub run digger -- build "master + dmd#15362"
Why not just deprecate all void initialization in @safe functions?
That is a lot more controversial
I don't see how randomly choosing a different behavior for bool is less controversial. I would rather go all in and deprecate all void initialization in @safe functions, than special case a type.
This is what Simen Kjaeraas is complaining about in the bug report:
So instead of closing the obvious hole of @safe functions using void initialization we're just poking at symptoms here and there?
I don't see how randomly choosing a different behavior for
boolis less controversial.
It's not random, it's about whether the type has unsafe values. See https://dlang.org/spec/function.html#safe-values
Considering bool as a type with unsafe values was originally included in my system variables DIP:
https://github.com/dkorpel/DIPs/blob/c2078c52ae40f7ee713dc74776322aacabeb7877/DIPs/DIP1NNN-DK.md
Another issue is that, unlike pointers, the
booltype is regarded@safeto overlap or void-initialize. The problem here is that the optimizer may assume it is always0or1, but the bool type can be anyubytevalue in practice. By constructing aboolthat is larger than 1 and indexing it in an array with compile-time length 2, memory can be corrupted since the range check is elided. See issue 19968: @safe code can create invalid bools resulting in memory corruptionThe issue is marked fixed after DMD PR #10055 was merged, which defensively inserts a bitwise 'and' operation (
b & 1) whenboolvalues are promoted to integers. This is not a complete solution however, since there are also other ways in which invalid booleans give undefined behavior: void initializated bool can be both true and false. This happens because negation of aboolis simply toggling the least significant bit with the xor operator (b ^ 1), which only works if all other bits are 0. This could, again, be fixed by inserting more& 1instructions, but it starts to defeat the purpose of a specializedboolprimitive type when it cannot be guaranteed to be0or1: you might as well makeboola wrapper aroundubytethen.
It was later removed to make the DIP more focused, but the idea of considering bool a type with unsafe values is still sound.
The spec would need to be changed:
- For basic data types, all possible bit patterns are safe.
+ For basic data types, all possible bit patterns are safe, except for `bool`, for which only `0` and `1` are safe.
I'm not even sure the bug is valid and maybe the best is to disallow "dereferencing" the variable if it hasn't been assigned to yet. It's clearly not un-@safe to void-initialise a boolean, since that's perfectly memory-safe.
It's clearly not un-
@safeto void-initialise a boolean, since that's perfectly memory-safe.
That depends on what you want from your boolean type. If you want to allow bools that are neither true nor false, then it is memory-safe. If you want poor code generation defensively masking out 7/8 bits on every read, then it is memory safe. If you want a proper 2-state boolean with good code generation, it's not memory-safe to void-initialize it.
I don't see how a "quantum boolean" can violate memory safety, it's a scalar. It'll produce wrong results, and almost definitely be a bug, but not unsafe memory-wise.
I don't see how a "quantum boolean" can violate memory safety, it's a scalar.
A pointer is also just an integer. The thing that matters is whether there are invariants on the bit pattern that must hold to ensure memory safety, which is not exclusive to pointer values. dip1035 has examples of this in the rationale section.
For bool, the simplest example (from issue 19968) is this:
int f() @safe
{
int[2] arr;
bool i = void;
return arr[i]; // no bounds check inserted by compiler
}
The elision of the bounds check relies on the invariant that a bool can only be 0 or 1. This is not just a wrong result, this is memory corruption. The description of issue 20148 also has an example of memory corruption.
In general, any form of undefined behavior following from invalidated type invariants can lead to memory corruption. Memory corruption, undefined behavior, violations of type safety, wrong optimizer assumptions etc. are all basically the same thing.
int f() @safe { int[2] arr; bool i = void; return arr[i]; // no bounds check inserted by compiler }The elision of the bounds check relies on the invariant that a
boolcan only be 0 or 1. This is not just a wrong result, this is memory corruption. The description of issue 20148 also has an example of memory corruption.
I don't see memory corruption per say. Rather there's two states that any variable could be in, set, and read.
If a read occurs before a set, that is grounds for adding an error to avoid clear and obvious bugs.
I still think that making it an error to read from void-initialised variables that haven't been written to makes more sense. It's always going to be a bug, after all (unless someone is implementing a very bad RNG).
That's impossible to check in the general case. Example:
void fill(bool[] b);
bool f()
{
bool[64] buf = void;
fill(buf[]);
return buf[50]; // has it been written to?
}
What will you do here?
That's impossible to check in the general case. Example:
void fill(bool[] b); bool f() { bool[64] buf = void; fill(buf[]); return buf[50]; // has it been written to? }What will you do here?
I think it should error at fill(buf[]) because that slice is a read.
If you intend for fill to initialise it, make it an out parameter.
In an attempt to address some further rebuttals...
What about void fill(ref bool[] b);?
Still treat it as a read of buf even if you're passing it by reference.
What about fill(&buf)?
In this case, assume that it has been initialized, but it's still an error to take address of locals in @safe code (correct me if I've mistaken it for returning addresses of).
I think it should error at
fill(buf[])because that slice is a read.
Why allow void-initializing a bool[] in @safe code but then go through the trouble of making virtually every subsequent operation you'd typically do with it unusable in @safe code?
In an attempt to address some further rebuttals...
Still needs to be addressed:
- control flow (if statements)
- overlapping a bool in a union
- casting an array to a
bool[]
but it's still an error to take address of locals in
@safecode
Not with -preview=dip1000
What about
void fill(ref bool[] b);? Still treat it as a read ofbufeven if you're passing it by reference.
Saying that, I don't think it would be untenable to have a simple per-function variable/parameter state tracker, such that fill(buf[]) checks whether the first parameter to fill has a "set" state (if it was set in an if branch, can't have flag unless also unambiguously set in an else branch, etc).
But it will take some time and devotion for someone to implement it.
I don't think write before read is relevant here. Modifying the example above slightly:
int f() @safe
{
int[2] arr;
bool i = void;
arr[i ? 1 : 0] = 3;
return arr[i];
}
This generates this obviously memory corrupting code with -O on run.dlang.io:
@safe int onlineapp.f():
push RBP
mov RBP,RSP
sub RSP,010h
mov -010h[RBP],RBX
mov qword ptr -8[RBP],0
xor EAX,EAX
mov AL,DL
lea RCX,-8[RBP]
mov [RAX*4][RCX],3
mov BL,DL
and EBX,1
mov EBX,EBX
mov EAX,[RBX*4][RCX]
mov RBX,-010h[RBP]
mov RSP,RBP
pop RBP
ret
add [RAX],AL
add [RAX],AL
.text.@safe int onlineapp.f() ends
i is read from a random register, than assumed to be 0 or 1 when writing to arr[]. It doesn't even use & 1 as when using the boolean as the index directly.
I don't think write before read is relevant here.
I guess what Iain refers to is writing to the bool first, so my argument is beside the point. Sorry for the noise ;-)
Saying that, I don't think it would be untenable to have a simple per-function variable/parameter state tracker, such that fill(buf[]) checks whether the first parameter to fill has a "set" state (if it was set in an if branch, can't have flag unless also unambiguously set in an else branch, etc). But it will take some time and devotion for someone to implement it.
This is very similar, if not the same, to checking whether an immutable is being assigned or initialized inside a constructor.
Pro: there should be some existing logic in the compiler to implement this. Con: it's definitely incomplete.
If this were to be implemented, I would argue it should be extended for all void initialized types.
A very important use case for void initializations is the stack allocation of a buffer, of which only a portion of it is used. This is a major optimization.
I looked at all the various cases here and in the bugzilla where the random bits can cause trouble. Eliminating void initializations for bool still leaves holes - the bool problem with unions which is there even with no void initializations.
I looked at data flow analysis to prove initialization before use. Unfortunately, the front end is poorly equipped to do DFA. DFA in the optimizer pass cannot prove write-before-read, either (it can only prove read-before-write).
The only thing I can think of that works 100% is to, whenever a bool is read, add & 1. We can restrict this to @safe code only. The optimizer should be able to elide many instances of it. Also a read that is a copy of a bool from one bool to another does not need the mask (we still really want to use memcpy!). This won't break any existing code (unless it actually relied on those extra bits, and I don't feel sorry for breaking that!)
A very important use case for void initializations is the stack allocation of a buffer, of which only a portion of it is used.
You can still void initialize a bool array in @system / @trusted code, and in @safe code, you can void initialize a ubyte[] which is almost the same as bool[].
Eliminating void initializations for bool still leaves holes - the bool problem with unions which is there even with no void initializations.
I know, I opened the PR with "Still need to prevent array cast and union overlap". Pointers and system variables also need this. It's not hard to add bool to that list, and it works 100%
The real proposal here is in the spec PR (https://github.com/dlang/dlang.org/pull/3649) not this prototype implementation PR.
Spec PR was merged. Time to revisit this?
Yes, I need to generalize the implementation so any aggregate with a bool (or system variable) becomes unsafe to void-initialize.
@RazvanN7 This is ready now