Bounds checking fails to compile in 16-bit mode (MSP430)
The following code, when compiled with bounds checking on, fails to compile for -mtriple=msp430 :
void test()
{
int[1] array;
int k = 0;
auto value = array[k];
}
Assertion failed: (getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"), function AssertOK, file llvm/include/llvm/IR/Instructions.h, line 1113.
0 ldc2 0x0000000108cc4f8c llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 60
1 ldc2 0x0000000108cc5559 PrintStackTraceSignalHandler(void*) + 25
2 ldc2 0x0000000108cc0ee9 llvm::sys::RunSignalHandlers() + 425
3 ldc2 0x0000000108cc58d2 SignalHandler(int) + 354
4 libsystem_platform.dylib 0x00007fff4fffcf5a _sigtramp + 26
5 libsystem_platform.dylib 0x00007fe07ed13948 _sigtramp + 785476104
6 libsystem_c.dylib 0x00007fff4fe27312 abort + 127
7 libsystem_c.dylib 0x00007fff4fdef368 basename_r + 0
8 ldc2 0x000000010615ca13 llvm::ICmpInst::AssertOK() + 243
9 ldc2 0x000000010615c90e llvm::ICmpInst::ICmpInst(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&) + 142
10 ldc2 0x000000010615c873 llvm::ICmpInst::ICmpInst(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&) + 51
11 ldc2 0x000000010615352d llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateICmp(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::Twine const&) + 221
12 ldc2 0x00000001060845eb DtoIndexBoundsCheck(Loc&, DValue*, DValue*) + 347
13 ldc2 0x00000001061f7d93 ToElemVisitor::visit(IndexExp*) + 1571
14 ldc2 0x00000001061eda26 toElem(Expression*) + 70
15 ldc2 0x00000001061f8d69 ToElemVisitor::visit(AssignExp*) + 1449
16 ldc2 0x00000001061eda26 toElem(Expression*) + 70
17 ldc2 0x000000010614f77f DtoVarDeclaration(VarDeclaration*) + 1231
18 ldc2 0x000000010614fbaa DtoDeclarationExp(Dsymbol*) + 298
19 ldc2 0x00000001061f36b3 ToElemVisitor::visit(DeclarationExp*) + 211
20 ldc2 0x00000001061edd69 toElemDtor(Expression*) + 73
21 ldc2 0x00000001061c2c6a ToIRVisitor::visit(ExpStatement*) + 282
22 ldc2 0x00000001061c2d93 ToIRVisitor::visit(CompoundStatement*) + 227
23 ldc2 0x00000001061c2d93 ToIRVisitor::visit(CompoundStatement*) + 227
24 ldc2 0x00000001061c2a92 Statement_toIR(Statement*, IRState*) + 66
25 ldc2 0x00000001061341b3 DtoDefineFunction(FuncDeclaration*, bool) + 7891
26 ldc2 0x00000001060e57f5 CodegenVisitor::visit(FuncDeclaration*) + 53
27 ldc2 0x00000001060e301b CodegenVisitor::visit(AttribDeclaration*) + 155
28 ldc2 0x00000001060e2e92 Declaration_codegen(Dsymbol*, IRState*) + 66
29 ldc2 0x00000001060e2e3f Declaration_codegen(Dsymbol*) + 31
30 ldc2 0x0000000106168f6e codegenModule(IRState*, Module*) + 1374
31 ldc2 0x00000001062a8b2c ldc::CodeGenerator::emit(Module*) + 300
32 ldc2 0x00000001062fd8ce codegenModules(Array<Module*>&) + 574
33 ldc2 0x0000000105f54553 mars_mainBody(Array<char const*>&, Array<char const*>&) + 6083
I looked around. I guess the problem will be here:
llvm::ICmpInst::Predicate cmpop = llvm::ICmpInst::ICMP_ULT;
llvm::Value *cond = gIR->ir->CreateICmp(cmpop, DtoRVal(index),
DtoArrayLen(arr), "bounds.cmp");
DtoArrayLen eventually leads to the changes I made to size_t in LDC for the MSP430, so I'm leaning towards the issue being with index not being of the proper type, but I'm a bit lost in the weeds here, so any help would be appreciated.
DtoArrayLen eventually leads to the changes I made to size_t in LDC for the MSP430, so I'm leaning towards the issue being with index not being of the proper type
Sounds reasonable. With a debugger, try outputting index->type->toChars(), that's the D type, and index->val->dump(), that's the IR value (probably a pointer, i.e., the k lvalue). As the D variable's type is int and I guess that's still 32-bits wide, you may have to truncate it (gIR->ir->CreateTrunc()).
Just a debugging trick which might or might not be applicable here: Did you try dump()-ing the LLVM module in the debugger when the assertion is hit? You will see the already emitted code, which might be helpful in wrapping your head about what is going on.
@klickverbot No, I didn't try it. My knowledge of LLVM is somewhat limited ATM, so those kind of suggestions help. I'll try that and @kinke's suggestion, and get back to you as soon as I find the time.
One interesting thing is that the bug happens even when k is a ushort (i.e. size_t for the MSP430). Another interesting thing is that D doesn't seem to care for that and extends it to a 32-bit integer for the array indexing:
(lldb) p index->val->dump()
%11 = zext i16 %10 to i32
I don't understand why calling toChars() in the index->type doesn't result in a string though:
(lldb) p index->type->toChars()
(const char *) $0 = 0x0000000000000001 <no value available>
I don't understand why calling toChars() in the index->type doesn't result in a string though
Hmm, that's D code, maybe give gdb a shot?
Another useful debugging tool is running ldc with the -vv switch. Lots of output, of special interest are the last lines before hitting the assertion.
Using CreateTrunc does indeed fix this issue. It also seems to be a no-op if the type is already correct, so I guess I don't need to do it conditionally. I think ideally the index would reach us with the correct type (without the zero extension) at the place the bounds checking is requested, as that would avoid creating busy work for the optimizer (16-bit size_t -> zext i32 -> 16-bit size_t), but I'm guessing that's related to the general desire that D has to always extend types to int, and so it would be difficult to correct, as that logic would be completely decoupled from the array indexing and bound checking.
I'll submit a pull request using CreateTrunc ok?
I'm currently playing with LDC on AVR. Bumped into this issue and did a bit of experimentation, gaining some insight.
Since D is designed for 32-bit architectures and up, it'd probably be painful to have size_t and ptrdiff_t be ushort and short respectively. Whether consciously or by accident, currently they are made uint/ints for achitectures with 16-bit pointers. PointerType.sizeof is still 2.
This is not by the book, since size_t and ptrdiff_t are supposed to be of same size as the pointers, but I'm not necessarily judging it. It's probably the best tradeoff since D code rarely expects to deal with array.length or ptrA - ptrB returning 16-bit integers, that then would have to be cast back at almost every corner.
So all fine so far. Where this falls flat is that the frontend and backend are in disagreement about the type of the integer. If you execute arr.length < 5, you will get a backend error since backend considers arr.length 16 bits wide, the frontend thinks it's already 32 bits and thus needs no cast to 32 bits. You can work around this by casting arr.length to ushort before doing anything else with it.
Thus the fix would be that when a pointer is used as a integer, backend code should automatically be generated to widen the integer to 32 bits at the backend.