torch-mlir
torch-mlir copied to clipboard
osx delocate doesn't seem to handle dynamically loaded libs
Latest delocate'd lib fails with
libc++abi: terminating with uncaught exception of type c10::Error: Type c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > could not be converted to
any of the known types.
Exception raised from operator() at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/jit_type.h:1735 (most recent call first):
frame #0: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 92 (0x1144d5e40 in libc10.dylib)
frame #1: c10::detail::getTypePtr_<c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > >::call()::'lambda'()::operator()() const + 304 (0x15befa85c i
n libtorch_cpu.dylib)
frame #2: c10::Type::SingletonOrSharedTypePtr<c10::Type> c10::getTypePtrCopy<c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > >() + 32 (0x15befa5b
4 in libtorch_cpu.dylib)
frame #3: c10::detail::infer_schema::(anonymous namespace)::createArgumentVector(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 188 (0x15b93f7d4 in libtorch_cpu.dylib)
frame #4: c10::detail::infer_schema::make_function_schema(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::al
locator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 96 (0x15b93f584 in libtorch_cpu.dylib)
frame #5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 60 (0x15b93fadc in libtorch_cpu.dylib)
frame #6: std::__1::unique_ptr<c10::FunctionSchema, std::__1::default_delete<c10::FunctionSchema> > c10::detail::inferFunctionSchemaFromFunctor<at::Tensor (*)(at::Tensor, c10::intrusive_ptr<ConvPackedParamsBase<2>, c1
0::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&, double, long long)>() + 76 (0x15bf697d0 in libtorch_cpu.dylib)
frame #7: torch::CppFunction::CppFunction<at::Tensor (at::Tensor, c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&, double, long long)>(at:
:Tensor (*)(at::Tensor, c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&, double, long long), std::__1::enable_if<c10::guts::is_function_ty
pe<at::Tensor (at::Tensor, c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&, double, long long)>::value, std::nullptr_t>::type) + 132 (0x15
bf696f4 in libtorch_cpu.dylib)
frame #8: at::native::(anonymous namespace)::TORCH_LIBRARY_IMPL_init_quantized_QuantizedCPU_4(torch::Library&) + 40 (0x15bf67a2c in libtorch_cpu.dylib)
frame #9: torch::detail::TorchLibraryInit::TorchLibraryInit(torch::Library::Kind, void (*)(torch::Library&), char const*, c10::optional<c10::DispatchKey>, char const*, unsigned int) + 208 (0x15b7cd0e4 in libtorch_cpu.
dylib)
frame #10: _GLOBAL__sub_I_qconv.cpp + 88 (0x15bf6d3b8 in libtorch_cpu.dylib)
frame #11: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 164 (0x100e19d7c in dyld)
frame #12: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 168 (0x100e42f40
in dyld)
frame #13: invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 528 (0x100e39bc0 in dyld)
frame #14: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 168 (0x100e05f98 in dyld)
frame #15: dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 192 (0x100e39968 in dyld)
frame #16: dyld3::MachOAnalyzer::forEachInitializerPointerSection(Diagnostics&, void (unsigned int, unsigned int, unsigned char const*, bool&) block_pointer) const + 148 (0x100e42870 in dyld)
frame #17: dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 432 (0x100e42b70 in dyld)
frame #18: dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 172 (0x100e19cbc in dyld)
frame #19: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 216 (0x100e19e68 in dyld)
frame #20: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 180 (0x100e19e44 in dyld)
frame #21: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 180 (0x100e19e44 in dyld)
frame #22: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 180 (0x100e19e44 in dyld)
frame #23: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 180 (0x100e19e44 in dyld)
frame #24: dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 124 (0x100e19f34 in dyld)
frame #25: dyld4::APIs::dlopen_from(char const*, int, void*) + 520 (0x100e299e4 in dyld)
frame #26: _imp_create_dynamic + 1852 (0x1007ff7dc in python3.9)
frame #27: cfunction_vectorcall_FASTCALL + 208 (0x10071427c in python3.9)
frame #28: _PyEval_EvalFrameDefault + 30088 (0x1007cc078 in python3.9)
frame #29: _PyEval_EvalCode + 2968 (0x1007c44a8 in python3.9)
frame #30: _PyFunction_Vectorcall + 240 (0x1006bfe64 in python3.9)
Is this a dylib issue?
edit: sorry, I parsed the error message wrong -- still, it seems like an issue in PyTorch itself, rather than in Torch-MLIR.
yeah I think delocate can't find the quant library that is opened at runtime ??
Anyway this is now moot since the builder builds a perfectly installable universal binary on both M1 and Intel macOS. Closing this for now.
going to leave it open since I may have just seen it on my Intel macOS
This seems to be because of the weak linking of torch symbols. https://github.com/pytorch/pytorch/issues/48452
In our package we have:
site-packages % find . -name '*.dylib' | grep torch
./torchvision/.dylibs/libz.1.2.11.dylib
./torchvision/.dylibs/libpng16.16.dylib
./torchvision/.dylibs/libc++.1.0.dylib
./torchvision/.dylibs/libjpeg.9.dylib
./torch/lib/libtorch_python.dylib
./torch/lib/libtorch.dylib
./torch/lib/libtorch_global_deps.dylib
./torch/lib/libtorch_cpu.dylib
./torch/lib/libc10.dylib
./torch/lib/libshm.dylib
./torch_mlir/.dylibs/libtorch_python.dylib
./torch_mlir/.dylibs/libtorch.dylib
./torch_mlir/.dylibs/libomp.dylib
./torch_mlir/.dylibs/libtorch_cpu.dylib
./torch_mlir/.dylibs/libc10.dylib
./torch_mlir/.dylibs/libshm.dylib
./torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.dylib
Since we build on an x96_64 system the files in .dylibs are a no-op on Apple silicon and they resolve well and work ok. On Intel they cause the double init issue.
We probably have to link against a static version of libtorch.
Workaroud
# Replace mlir_venv with whatever your venv is
cd mlir_venv/lib/python3.10/site-packages/torch_mlir/.dylibs
rm *.dylib
ln -s ../../torch/lib/libc10.dylib
ln -s ../../torch/lib/libshm.dylib
ln -s ../../torch/lib/libtorch.dylib
ln -s ../../torch/lib/libtorch_cpu.dylib
ln -s ../../torch/lib/libtorch_python.dylib
@powderluv did we fix this with the static build?
This is still an issue on x86 macOS builds. But we don't care about those builds right now so ok to leave it closed.
I am experiencing this issue in an environment, where two .plugins try to load the same dylibs. The first plugin load successfully, but the second fails with the mentioned errors.
Workaroud
# Replace mlir_venv with whatever your venv is cd mlir_venv/lib/python3.10/site-packages/torch_mlir/.dylibs rm *.dylib ln -s ../../torch/lib/libc10.dylib ln -s ../../torch/lib/libshm.dylib ln -s ../../torch/lib/libtorch.dylib ln -s ../../torch/lib/libtorch_cpu.dylib ln -s ../../torch/lib/libtorch_python.dylib
The issue is still there on Intel Macs, and the workaround still work. Just don't delete all .dylib and -f
to replace the one that need to be linked.
ln -s -f ../../torch/lib/libc10.dylib
ln -s -f ../../torch/lib/libshm.dylib
ln -s -f ../../torch/lib/libtorch.dylib
ln -s -f ../../torch/lib/libtorch_cpu.dylib
ln -s -f ../../torch/lib/libtorch_python.dylib
Workaroud
# Replace mlir_venv with whatever your venv is cd mlir_venv/lib/python3.10/site-packages/torch_mlir/.dylibs rm *.dylib ln -s ../../torch/lib/libc10.dylib ln -s ../../torch/lib/libshm.dylib ln -s ../../torch/lib/libtorch.dylib ln -s ../../torch/lib/libtorch_cpu.dylib ln -s ../../torch/lib/libtorch_python.dylib
The issue is still there on Intel Macs, and the workaround still work. Just don't delete all .dylib and
-f
to replace the one that need to be linked.ln -s -f ../../torch/lib/libc10.dylib ln -s -f ../../torch/lib/libshm.dylib ln -s -f ../../torch/lib/libtorch.dylib ln -s -f ../../torch/lib/libtorch_cpu.dylib ln -s -f ../../torch/lib/libtorch_python.dylib
Have done this but still getting the same error? After replacing, how do i re-compile?