swift-apis icon indicating copy to clipboard operation
swift-apis copied to clipboard

On macOS, simple models can trigger a segfault within X10

Open BradLarson opened this issue 4 years ago • 0 comments

Some simple image classification models can trigger a segfault when using the XLA device specifically on macOS. For now, we're explicitly having them use the eager-mode device instead until this can be fixed.

The crash produces a backtrace like the following:

* thread tensorflow/swift-models#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000104be39bb libx10.dylib`xla::XrtComputationClient::XrtData::GetOpaqueHandle() + 11
    frame tensorflow/swift-models#1: 0x0000000102d892ce libx10.dylib`swift_xla::XLATensor::RunPostOrder(std::__1::vector<swift_xla::XLATensor, std::__1::allocator<swift_xla::XLATensor> > const&, absl::Span<unsigned long const>) + 718
    frame tensorflow/swift-models#2: 0x0000000102d86605 libx10.dylib`swift_xla::XLATensor::SyncTensorsGraphInternal(std::__1::vector<swift_xla::XLATensor, std::__1::allocator<swift_xla::XLATensor> >*, absl::Span<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const>, swift_xla::XLATensor::SyncTensorsConfig const&) + 181
    frame tensorflow/swift-models#3: 0x0000000102d87b3a libx10.dylib`swift_xla::XLATensor::SyncTensorsGraph(std::__1::vector<swift_xla::XLATensor, std::__1::allocator<swift_xla::XLATensor> >*, absl::Span<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const>, bool, bool) + 122
    frame tensorflow/swift-models#4: 0x0000000102d8ad46 libx10.dylib`swift_xla::XLATensor::SyncLiveTensorsGraph(swift_xla::Device const*, absl::Span<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const>, bool) + 102
    frame tensorflow/swift-models#5: 0x0000000102c75366 libx10.dylib`XLATensor_LazyTensorBarrier + 118
    frame tensorflow/swift-models#6: 0x000000010256e22a libswiftTensorFlow.dylib`closure tensorflow/swift-models#1 (inout __C.DeviceList) -> () in TensorFlow.LazyTensorBarrier(on: Swift.Optional<TensorFlow.Device>, devices: Swift.Array<TensorFlow.Device>, wait: Swift.Bool) -> () + 314
    frame tensorflow/swift-models#7: 0x000000010256d72c libswiftTensorFlow.dylib`reabstraction thunk helper from @callee_guaranteed (@inout __C.DeviceList) -> () to @escaping @callee_guaranteed (@inout __C.DeviceList) -> (@out ()) + 12
    frame tensorflow/swift-models#8: 0x000000010256e2a1 libswiftTensorFlow.dylib`reabstraction thunk helper from @callee_guaranteed (@inout __C.DeviceList) -> () to @escaping @callee_guaranteed (@inout __C.DeviceList) -> (@out ())partial apply forwarder with unmangled suffix ".2" + 17
    frame tensorflow/swift-models#9: 0x000000010256caa1 libswiftTensorFlow.dylib`closure tensorflow/swift-models#2 (Swift.UnsafeBufferPointer<__C.CDevice>) -> τ_1_0 in Swift.Array<τ_0_0 where τ_0_0 == TensorFlow.Device>.withDeviceList<τ_0_0>((inout __C.DeviceList) -> τ_1_0) -> τ_1_0 + 177
    frame tensorflow/swift-models#10: 0x000000010256ef0f libswiftTensorFlow.dylib`partial apply forwarder for closure tensorflow/swift-models#2 (Swift.UnsafeBufferPointer<__C.CDevice>) -> τ_1_0 in Swift.Array<τ_0_0 where τ_0_0 == TensorFlow.Device>.withDeviceList<τ_0_0>((inout __C.DeviceList) -> τ_1_0) -> τ_1_0 + 47
    frame tensorflow/swift-models#11: 0x0000000101323776 libswiftCore.dylib`Swift._ArrayBuffer.withUnsafeBufferPointer<τ_0_0>((Swift.UnsafeBufferPointer<τ_0_0>) throws -> τ_1_0) throws -> τ_1_0 + 230
    frame tensorflow/swift-models#12: 0x00000001013343a9 libswiftCore.dylib`Swift.Array.withUnsafeBufferPointer<τ_0_0>((Swift.UnsafeBufferPointer<τ_0_0>) throws -> τ_1_0) throws -> τ_1_0 + 9
    frame tensorflow/swift-models#13: 0x000000010256c7d6 libswiftTensorFlow.dylib`Swift.Array<τ_0_0 where τ_0_0 == TensorFlow.Device>.withDeviceList<τ_0_0>((inout __C.DeviceList) -> τ_1_0) -> τ_1_0 + 422
    frame tensorflow/swift-models#14: 0x000000010256e0da libswiftTensorFlow.dylib`TensorFlow.LazyTensorBarrier(on: Swift.Optional<TensorFlow.Device>, devices: Swift.Array<TensorFlow.Device>, wait: Swift.Bool) -> () + 138
    frame tensorflow/swift-models#15: 0x000000010054c480 LeNet-MNIST`main at main.swift:88:9 [opt]
    frame tensorflow/swift-models#16: 0x00007fff704657fd libdyld.dylib`start + 1
    frame tensorflow/swift-models#17: 0x00007fff704657fd libdyld.dylib`start + 1

BradLarson avatar Jun 08 '20 18:06 BradLarson