Tapir-LLVM icon indicating copy to clipboard operation
Tapir-LLVM copied to clipboard

Copy constructor within spawn not outlined

Open ehein6 opened this issue 5 years ago • 4 comments

When a cilk_spawn expression is outlined into a function, all sub-expressions are moved into the outlined function. For example, in cilk_spawn foo(x + y), the values of x and y are passed as arguments to the outlined function, which adds them together before calling foo. Nested function calls are also handled in this way.

Now consider the case where one of the arguments to the spawned function is an object, passed by value. An implicit call to the object's copy-constructor occurs here. I would have expected to see the call to the copy-constructor occur within the outlined function, but in Tapir it happens before the call to the outlined function. Is this intentional? Why the inconsistent behavior?

Example:

volatile long * marker;
#include <cilk/cilk.h>
#define noinline __attribute__((noinline))
struct object {
    long _value;
    noinline object(long value) : _value(value) {
        *marker = 0xC;
    }
    noinline object(const object& other) {
        *marker = 0xCC;
    }
};
noinline long add(long x, long y) { return x + y;}
noinline void child(long a, long b, object c)
{
    *marker += a + b + c._value;
}
noinline long parent(long x) 
{
    object obj(0xEE);
    for (long i = 0; i < 100; ++i){
        // *** LOOK AT THIS SPAWN ***
        // Note that obj is passed by value
        cilk_spawn child(x+i, add(x, i), obj);
        obj._value++;
    }
    cilk_sync;
    return *marker;
}

Compiled with -g0 -fcilkplus -std=c++14 -O1 -emit-llvm -mllvm -debug-abi-calls. Here's the demangled IR for the outlined function. Notice the evaluation of x + i and the call to add have been outlined as expected, but the copy constructor is not here.

define internal fastcc void @_Z6parentl_det.achd.cilk(i64 %x.cilk, i64 %i.07.cilk, %struct.object* nocapture readonly align 8 %agg.tmp.cilk) unnamed_addr #2 {
  %__cilkrts_sf = alloca %struct.__cilkrts_stack_frame, align 8
  call fastcc void @__cilkrts_enter_frame_fast_1(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
  call fastcc void @__cilkrts_detach(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
  %call.cilk = call i64 @add(long, long)(i64 %x.cilk, i64 %i.07.cilk)
  %add.cilk = add nsw i64 %i.07.cilk, %x.cilk
  call void @child(long, long, object)(i64 %add.cilk, i64 %call.cilk, %struct.object* nonnull %agg.tmp.cilk)
  call fastcc void @__cilk_parent_epilogue(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
  ret void
}

Here's the call to the copy constructor at the call site:

for.body:                                         ; preds = %det.cont, %entry
  %i.07 = phi i64 [ 0, %entry ], [ %inc1, %det.cont ]
  call void @object::object(object const&)(%struct.object* nonnull %agg.tmp, %struct.object* nonnull dereferenceable(8) %obj)
  call void asm sideeffect "stmxcsr $0\0A\09fnstcw $1", "*m,*m,~{dirflag},~{fpsr},~{flags}"(i32* %1, i16* %2) #7          %9 = call i8* @llvm.frameaddress(i32 0)
  store volatile i8* %9, i8** %4, align 8
  %10 = call i8* @llvm.stacksave()
  store volatile i8* %10, i8** %5, align 8
  %11 = call i32 @llvm.eh.sjlj.setjmp(i8* %6) #9
  %12 = icmp eq i32 %11, 0
  br i1 %12, label %for.body.split, label %det.cont

for.body.split:                                   ; preds = %for.body
  call fastcc void @_Z6parentl_det.achd.cilk(i64 %x, i64 %i.07, %struct.object* nonnull %agg.tmp)
  br label %det.cont

ehein6 avatar Jul 23 '19 19:07 ehein6

Thanks for pointing out this issue.

I was in the midst of crafting a long response about why Cilk evaluates all function arguments before the spawn and why your example contains a subtlety — compiler optimizations at -O1 — that makes the behavior appear inconsistent. But thinking about it further, I think your example gets at something deeper concerning object construction and destruction with Cilk and Tapir.

Right now is a particularly busy time for me, but I'll try to get back to you soon.

neboat avatar Jul 24 '19 13:07 neboat

Thanks for the quick response, I appreciate any time you can devote to this. Some more thoughts:

The key question is whether child is guaranteed to have a private copy of obj after the detach. I'd argue that it should, to preserve the semantics of pass-by-value during a spawn.

Here's my mental model of what should happen:

  1. Parent calls the outlined function det.achd
  2. det.achd gets ready to call child, then does a detach.
  3. After this point, other workers are free to steal parent's stack frame and continue. This is safe because any arguments passed to child live on det.achd's stack, having been initialized in step 2.

From the IR, it looks like all the detached children will share a single copy of obj created on the stack of parent. But I must be missing something, because on x86 I never observe a race condition bug here.

I've verified that the behavior for this example is the same at -O3.

ehein6 avatar Jul 24 '19 18:07 ehein6

(Finally getting back to this question after a major deadline on my end, and before my next major deadline in a week.)

I agree with your mental model of what should happen in this case. I'm working through a fix, but it might take a while to implement and test.

neboat avatar Aug 10 '19 11:08 neboat

Here's an interesting wrinkle: If I give object a user-defined destructor, then suddenly Tapir puts the alloca, constructor call, and destructor call in the outlined function like I expect.

struct object {
    long _value;
    noinline object(long value) : _value(value) {
        *marker = 0xC;
    }
    noinline object(const object& other) {
        *marker = 0xCC;
    }
    noinline ~object() {
        *marker = 0xD;
    }
};
// ...
// The rest is unchanged
; Function Attrs: noinline nounwind uwtable
define internal fastcc void @_Z6parentl_invoke.cont.cilk(i64 %x.cilk, i64 %i.015.cilk, %struct.object* align 8 %obj.cilk) unnamed_addr #2 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
for.body.cilk:
  %__cilkrts_sf = alloca %struct.__cilkrts_stack_frame, align 8
  call fastcc void @__cilkrts_enter_frame_fast_1(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
  call fastcc void @__cilkrts_detach(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
  %agg.tmp.cilk = alloca %struct.object, align 8
  %0 = bitcast %struct.object* %agg.tmp.cilk to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0)
  %call.cilk = call i64 @add(long, long)(i64 %x.cilk, i64 %i.015.cilk)
  %add.cilk = add nsw i64 %i.015.cilk, %x.cilk
  call void @object::object(object const&)(%struct.object* nonnull %agg.tmp.cilk, %struct.object* nonnull dereferenceable(8) %obj.cilk)
  call void @child(long, long, object)(i64 %add.cilk, i64 %call.cilk, %struct.object* nonnull %agg.tmp.cilk)
  call void @object::~object()(%struct.object* nonnull %agg.tmp.cilk) #7
  call fastcc void @__cilk_parent_epilogue(%struct.__cilkrts_stack_frame* nonnull %__cilkrts_sf)
  ret void
}

ehein6 avatar Oct 08 '19 17:10 ehein6