Tapir-LLVM
Tapir-LLVM copied to clipboard
Copy constructor within spawn not outlined
When a cilk_spawn expression is outlined into a function, all sub-expressions are moved into the outlined function. For example, in cilk_spawn foo(x + y)
, the values of x
and y
are passed as arguments to the outlined function, which adds them together before calling foo
. Nested function calls are also handled in this way.
Now consider the case where one of the arguments to the spawned function is an object, passed by value. An implicit call to the object's copy-constructor occurs here. I would have expected to see the call to the copy-constructor occur within the outlined function, but in Tapir it happens before the call to the outlined function. Is this intentional? Why the inconsistent behavior?
Example:
volatile long * marker;
#include <cilk/cilk.h>
#define noinline __attribute__((noinline))
struct object {
long _value;
noinline object(long value) : _value(value) {
*marker = 0xC;
}
noinline object(const object& other) {
*marker = 0xCC;
}
};
noinline long add(long x, long y) { return x + y;}
noinline void child(long a, long b, object c)
{
*marker += a + b + c._value;
}
noinline long parent(long x)
{
object obj(0xEE);
for (long i = 0; i < 100; ++i){
// *** LOOK AT THIS SPAWN ***
// Note that obj is passed by value
cilk_spawn child(x+i, add(x, i), obj);
obj._value++;
}
cilk_sync;
return *marker;
}
Compiled with -g0 -fcilkplus -std=c++14 -O1 -emit-llvm -mllvm -debug-abi-calls
. Here's the demangled IR for the outlined function. Notice the evaluation of x + i
and the call to add
have been outlined as expected, but the copy constructor is not here.
define internal fastcc void @_Z6parentl_det.achd.cilk(i64 %x.cilk, i64 %i.07.cilk, %struct.object* nocapture readonly align 8 %agg.tmp.cilk) unnamed_addr #2 {
%__cilkrts_sf = alloca %struct.__cilkrts_stack_frame, align 8
call fastcc void @__cilkrts_enter_frame_fast_1(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
call fastcc void @__cilkrts_detach(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
%call.cilk = call i64 @add(long, long)(i64 %x.cilk, i64 %i.07.cilk)
%add.cilk = add nsw i64 %i.07.cilk, %x.cilk
call void @child(long, long, object)(i64 %add.cilk, i64 %call.cilk, %struct.object* nonnull %agg.tmp.cilk)
call fastcc void @__cilk_parent_epilogue(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
ret void
}
Here's the call to the copy constructor at the call site:
for.body: ; preds = %det.cont, %entry
%i.07 = phi i64 [ 0, %entry ], [ %inc1, %det.cont ]
call void @object::object(object const&)(%struct.object* nonnull %agg.tmp, %struct.object* nonnull dereferenceable(8) %obj)
call void asm sideeffect "stmxcsr $0\0A\09fnstcw $1", "*m,*m,~{dirflag},~{fpsr},~{flags}"(i32* %1, i16* %2) #7 %9 = call i8* @llvm.frameaddress(i32 0)
store volatile i8* %9, i8** %4, align 8
%10 = call i8* @llvm.stacksave()
store volatile i8* %10, i8** %5, align 8
%11 = call i32 @llvm.eh.sjlj.setjmp(i8* %6) #9
%12 = icmp eq i32 %11, 0
br i1 %12, label %for.body.split, label %det.cont
for.body.split: ; preds = %for.body
call fastcc void @_Z6parentl_det.achd.cilk(i64 %x, i64 %i.07, %struct.object* nonnull %agg.tmp)
br label %det.cont
Thanks for pointing out this issue.
I was in the midst of crafting a long response about why Cilk evaluates all function arguments before the spawn and why your example contains a subtlety — compiler optimizations at -O1
— that makes the behavior appear inconsistent. But thinking about it further, I think your example gets at something deeper concerning object construction and destruction with Cilk and Tapir.
Right now is a particularly busy time for me, but I'll try to get back to you soon.
Thanks for the quick response, I appreciate any time you can devote to this. Some more thoughts:
The key question is whether child
is guaranteed to have a private copy of obj
after the detach. I'd argue that it should, to preserve the semantics of pass-by-value during a spawn.
Here's my mental model of what should happen:
- Parent calls the outlined function
det.achd
-
det.achd
gets ready to callchild
, then does a detach. - After this point, other workers are free to steal
parent
's stack frame and continue. This is safe because any arguments passed tochild
live ondet.achd
's stack, having been initialized in step 2.
From the IR, it looks like all the detached children will share a single copy of obj
created on the stack of parent
. But I must be missing something, because on x86 I never observe a race condition bug here.
I've verified that the behavior for this example is the same at -O3
.
(Finally getting back to this question after a major deadline on my end, and before my next major deadline in a week.)
I agree with your mental model of what should happen in this case. I'm working through a fix, but it might take a while to implement and test.
Here's an interesting wrinkle: If I give object
a user-defined destructor, then suddenly Tapir puts the alloca
, constructor call, and destructor call in the outlined function like I expect.
struct object {
long _value;
noinline object(long value) : _value(value) {
*marker = 0xC;
}
noinline object(const object& other) {
*marker = 0xCC;
}
noinline ~object() {
*marker = 0xD;
}
};
// ...
// The rest is unchanged
; Function Attrs: noinline nounwind uwtable
define internal fastcc void @_Z6parentl_invoke.cont.cilk(i64 %x.cilk, i64 %i.015.cilk, %struct.object* align 8 %obj.cilk) unnamed_addr #2 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
for.body.cilk:
%__cilkrts_sf = alloca %struct.__cilkrts_stack_frame, align 8
call fastcc void @__cilkrts_enter_frame_fast_1(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
call fastcc void @__cilkrts_detach(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
%agg.tmp.cilk = alloca %struct.object, align 8
%0 = bitcast %struct.object* %agg.tmp.cilk to i8*
call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0)
%call.cilk = call i64 @add(long, long)(i64 %x.cilk, i64 %i.015.cilk)
%add.cilk = add nsw i64 %i.015.cilk, %x.cilk
call void @object::object(object const&)(%struct.object* nonnull %agg.tmp.cilk, %struct.object* nonnull dereferenceable(8) %obj.cilk)
call void @child(long, long, object)(i64 %add.cilk, i64 %call.cilk, %struct.object* nonnull %agg.tmp.cilk)
call void @object::~object()(%struct.object* nonnull %agg.tmp.cilk) #7
call fastcc void @__cilk_parent_epilogue(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
ret void
}