swift
swift copied to clipboard
Missing symbols for custom op registration
This issue is opened as a continuation to twitter discussion with @dan-zheng;
What I'm trying to do is to introduce a custom op and use it with S4TF. For simplicity I'm trying to follow along official TF guide on custom op creation and compile a sample op (zero_out) from here.
I tried to do it with different variations:
1. xcrun --toolchain swift-tensorflow-RELEASE-0.7 clang++ -shared ops/zero_out_ops.cc kernels/zero_out_kernels.cc -fPIC -O2 -undefined dynamic_lookup -o zero_out.so -I/Library/Python/2.7/site-packages/tensorflow_core/include -std=c++11 -ltensorflow -L/Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.7.xctoolchain/usr/lib/swift/macosx
2. xcrun --toolchain swift-tensorflow-RELEASE-0.7 clang++ -shared ops/zero_out_ops.cc kernels/zero_out_kernels.cc -fPIC -O2 -undefined dynamic_lookup -o zero_out.so -I/Library/Python/2.7/site-packages/tensorflow_core/include -std=c++11
3. g++ -shared ops/zero_out_ops.cc kernels/zero_out_kernels.cc -fPIC -O2 -undefined dynamic_lookup -o zero_out.so -I/Library/Python/2.7/site-packages/tensorflow_core/include -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0
It all cases it compiles successfuly an zero_out.so artifact is successfully loaded via python TF 2.1.0-rc1 (which is the version I use headers from as it was used to compile 0.7 toolchain)
Next, I try to load it from Swift side with this dumb code:
import CTensorFlow
import TensorFlow
import Foundation
let tensor = Tensor<Float>(shape: [1,1], scalars: [0])
var path = "/Users/avolodin/Downloads/custom-op-master/tensorflow_zero_out/cc/zero_out.so".data(using: .utf8)
var status: OpaquePointer = TF_NewStatus()
let libraryHandle = path?.withUnsafeBytes({ pointer in
TF_LoadLibrary(pointer, status)
})
print(TF_GetCode(status))
print(String(cString: TF_Message(status)!))
and the message being printed is the following:
dlopen(/Users/avolodin/Downloads/custom-op-master/tensorflow_zero_out/cc/zero_out.so, 6): Symbol not found: __ZTIN10tensorflow8OpKernelE
Referenced from: /Users/avolodin/Downloads/custom-op-master/tensorflow_zero_out/cc/zero_out.so
Expected in: flat namespace
in /Users/avolodin/Downloads/custom-op-master/tensorflow_zero_out/cc/zero_out.so
so after an investigation is think that is happening because swift build-script-impl specifies
--define framework_shared_object=false
which according to the official TF docs removes custom op symbols:
Note that :framework and :lib have incomplete transitive dependencies (they declare but do not define some symbols) if framework_shared_object=True (meaning there is an explicit framework shared object). Missing symbols are included in //tensorflow:libtensorflow_framework.so. This split supports custom op registration; see comments on //tensorflow:libtensorflow_framework.so. It does mean that TensorFlow cc_test and cc_binary rules will not build. Using tf_cc_test and tf_cc_binary (from //tensorflow/tensorflow.bzl) will include the necessary symbols in binary build targets.
Thanks for filing this issue and doing some thorough investigation!
As you noted, libtensorflow_framework.so
seems necessary for your custom op registration. I'm not sure we've tried custom TensorFlow op registration in open source, so this is cool! I'm not very familiar with libtensorflow_framework.so
.
Previously (during Swift for TensorFlow 0.2 - 0.5?), we did build libtensorflow_framework.so
. However, we removed it: I think because it wasn't/isn't needed for tensorflow/swift-apis and had no known users, but maybe also because it caused some linker errors. @pschuh: do you remember why we removed it?
We could try building a toolchain with libtensorflow_framework.so
(via --define framework_shared_object=true
) to see if that resolves your issue.
I got it working!
It was fairly easy (outside of me waiting for 3 hours for my MBP to build the whole toolchain from scratch). I did two things:
- Switched
makeOp
function to being public in order to enable user to create their own ops - Changed bazel flag to
framework_shared_object=true
And it just worked!
I've made a PR to public function here, but patching build-script-impl
appeared to be difficult since tensorflow
and tensorflow-0.7
are now diverged too much in terms of build-script-impl
, so maybe you can guide me here?