vision
vision copied to clipboard
About RegisterOperators in C++
🐛 Bug
I'm testing about jit in C++ ( libtorch )
My test cases
int main(int argc, const char* argv[]) {
auto& ops = torch::jit::getAllOperators();
std::cout << "torch jit operators\n";
for (auto& op: ops) {
auto& name = op->schema().name();
if (name.find("torchvision") != std::string::npos)
std::cout << "op : " << op->schema().name() << "\n";
}
std::cout << "\n";
return 0;
}
>>> torch jit operators
Expected (I changed some codes in vision)
>>> torch jit operators
op : torchvision::_cuda_version
op : torchvision::deform_conv2d
op : torchvision::ps_roi_pool
op : torchvision::ps_roi_align
op : torchvision::_new_empty_tensor_op
op : torchvision::roi_pool
op : torchvision::roi_align
op : torchvision::nms
In 1st case, seems no jit opeators registry on libtorch.
Is this only occurs my env? or some codes are wrong?
Because I checked expected output when I changed srcs.
Environment
Install command follows on reamdme
torch 1.5.0 and torchvision 0.6.0
This issue is better discussed on the forum. I will close this issue, but please feel free to re-open with more information and details about what was changed so that we can reproduce.
Moreover, if you encounter an issue with jit, I encourage you to open an issue on pytorch instead.
@vincentqb This is not pytorch problem. It's about call the registry API when using C++ torchvision.
And the problem is registry parts not calling when the custom project includes Torchvision
So I moved registry parts to header, then the project is running registry codes when compile.
Could you check the jit operators in C++?
If I understand correctly: you have one version that works, and another that doesn't? Can you provide a complete minimal example to run with how you compile so we can reproduce?
cc @fmassa
@vincentqb Basically I just install by cmake and running hello_world And Now I'm trying to 0.6, but 0.5 has same problem
- I installed following
README.rst
cmake -DCMAKE_PREFIX_PATH="/home/user/torch_build/libtorch150" -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_CUDA=on ..
make -j16
make install
- collect_env.py
user(t3) in ~/torch_build via env150
➜ python3 collect_env.py
Collecting environment information...
PyTorch version: 1.5.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 8.3.0-16ubuntu3~16.04) 8.3.0
CMake version: version 3.17.0
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 418.87.01
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.4
Versions of relevant libraries:
[pip3] numpy==1.18.3
[pip3] torch==1.5.0+cu101
[pip3] torchvision==0.6.0+cu101
[conda] Could not collect
- main.cpp
- Just change examples/cpp/hello_world/main.py
#include <iostream>
#include <torchvision/vision.h>
int main()
{
auto& ops = torch::jit::getAllOperators();
std::cout << "torch jit operators\n";
for (auto& op: ops) {
auto& name = op->schema().name();
if (name.find("torchvision") != std::string::npos)
std::cout << "op : " << op->schema().name() << "\n";
}
std::cout << "\n";
return 0;
}
And I running upper main function.
I expect the torch::jit::getAllOperators in main can find operators related with torchvision.
But In my env, that shows nothing.
So I moved the torch::RegisterOperators() parts move to header(.h) not .cpp
If do this, compiler directly read the registry parts.
And then, shows torchvision methods as my expected.
In my experience, I same got problem on my custom kernel when I using jit.
So I changed registry called directly, then I can running fine and can find my kernel in getAllOperators
Thanks for the report @zsef123 !
ccing @suo , as I believe this is currently an issue with the torchscript RegisterOperators (or our use of it), which makes the operators not readily available for the users, who need instead to re-register the ops again in user code.
@fmassa So the vision lib purposes each user manually register ops in projects not automatically registered by include torch header.
thats right? I think I mis-understood that design
That's currently what's needed, but I don't think this is a desirable setup.
Maybe we just need to move the RegisterOperator calls from the C++ file into the headers in torchvision, so that they get registered automatically once the user includes it.
Let's keep this issue open until we figure out a better way of handling this
I think what's happening is that the linker is throwing out the static registration of the operators when linking in the torchvision lib, since the RegisterOperator object is never referenced from the main executable. Can you try linking in torchvision with -Wl,--whole-archive?
@suo Thanks to comments!!
I'm trying to like under codes.
CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(hello-world)
find_package(TorchVision REQUIRED)
add_executable(hello-world main.cpp)
target_link_libraries(hello-world
-Wl,--whole-archive
TorchVision::TorchVision
)
But this errors occurs
[ 50%] Linking CXX executable hello-world
/usr/lib/gcc/x86_64-linux-gnu/8/libgcc.a(_muldi3.o): In function `__multi3':
(.text+0x0): multiple definition of `__multi3'
/usr/lib/gcc/x86_64-linux-gnu/8/libgcc.a(_muldi3.o):(.text+0x0): first defined here
/usr/lib/gcc/x86_64-linux-gnu/8/libgcc.a(_negdi2.o): In function `__negti2':
(.text+0x0): multiple definition of `__negti2'
...
Is this my setup problem?
user(t3) in vision/examples/cpp/hello_world/build on master [!?] via env150 took 2s
➜ cmake --version
cmake version 3.17.0
CMake suite maintained and supported by Kitware (kitware.com/cmake).
user(t3) in vision/examples/cpp/hello_world/build on master [!?] via env150
➜ gcc --version
gcc (Ubuntu 8.3.0-16ubuntu3~16.04) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
whole-archive is not the right flag. The real problem is because your executable isn't using any symbols from torchvision, the linker is pruning it away. You can verify that this is the case by running ldd on the resulting executable; you will see that there is a dependency on libtorch.so but not on torchvision. If you specify
target_link_libraries(hello-world
-Wl,--no-as-needed
TorchVision::TorchVision
)
This will force the linker to include.
@fmassa we should consider updating the TorchVision cmake target to automatically specify --no-as-needed, this will help prevent this common failure mode.
@bmanga could you look into adding --no-as-needed in the torchvision CMakeLists?
@fmassa May I suggest to use an inline variable or a function to register the operators instead? I can give that a try first, so we can avoid hacks in the cmake scripts.
@ezyang what do you think?
I think it might be preferable to avoid having to ask users to register_torchvision_ops() in their C++ files.
@fmassa Inline variables should work just like the current approach. Officially they are C++17, but we can get essentially identical functionality using __attribute__((weak)).
I personally don't mind a RegisterOps function. I think there might be users of torchvision that don't need them, and for those who do it's just a function call in their main.
Either approach is IMO better than using linker hacks, otherwise we end up with a mess like pytorch's cmake scripts. As a bonus, we can even remove the hardcoded requirement that the lib must be built as a shared library.
EDIT: I think __attribute__((constructor)) is exactly what we need here. I will try to create a PR tomorrow.
@bmanga I don't think __attribute__((constructor)) will work? It will put things in the ctors section of your library, which is the same thing that static initializers do, and the linker will prune it away if you literally don't depend on any symbols.
There's nothing wrong with having a register operators function. It doesn't do any harm, and it will save you from having to do --no-as-needed. But, for better or worse, PyTorch as a library relies a LOT on static initializers, and I think it's better to fall in line with the prevailing convention as a downstream library (torchvision). So I am advocating that for the default behavior, we should pass in the right linker flags. It is not too much cmake.
cc'ing @malfet too
@ezyang true, late night thoughts should be left for the morning 😅
I think it is a bit unreasonable to expect the operators to be registered just by linking the library without even including a single header. Is that really a common case?
What about the following?
- Have the user include any header from the library, even just
vision.h - Document in the README that if you are only planning to use torchvision ops, you must either include something or add
set_target_properties(exe PROPERTIES LINK_WHAT_YOU_USE TRUE)to the torchvision consumer target.
If you think that is still not acceptable, I will copy the code from caffe2_interface_library macro.
Hi @bmanga
Thanks for the discussion. After seeing the discussion here, I think it might be preferable to follow PyTorch's convention.
@bmanga can you clarify why we would need to copy the code from caffe2_interface_library, instead of just passing --no-as-needed?
@fmassa because if we really want this behavior
- it should be done for windows, mac and linux
- We want to limit it to torchvision only, so we need to add
"-Wl,--no-as-needed,libtorchvision.so,-Wl,--as-needed" - If we can, it would be good not to tie ourselves into requiring that torchvision be built as a shared library
caffe2_interface_library mostly solves these problems, although it is not completely correct either because it assumes that the user target has --no-as-needed enabled by default, which may not be the case and be a source of other issues.
I agree that writing the relevant cmake code portably is not so easy. I am a bit curious how the #include trick would work; since I feel like you still have to use the symbols somehow, just externing this isn't enough. (Maybe if the header sets some weak symbol)?
@ezyang yes, that's what I had in mind when I mentioned inline variables.
In the vision.h header, we could have something like:
#if (defined __cpp_inline_variables) || __cplusplus >= 201703L
#define VISION_INLINE_VARIABLE inline
#else
#ifdef _MSC_VER
#define VISION_INLINE_VARIABLE __declspec(selectany)
#else
#define VISION_INLINE_VARIABLE __attribute__((weak))
#endif
#endif
namespace vision {
namespace impl {
int RegisterOps() noexcept;
VISION_INLINE_VARIABLE int dummy = RegisterOps();
}
}
And that should be enough to ensure that the library is loaded (RegisterOps is defined in a cpp file), and that the code is run only once.
It feels a bit dirty doing this, but I'm willing to give it a try.
Hi. I am a maintainer of the Homebrew package manager. We have been struggling to update to pytorch 2.1.0 and torchvision 0.16.0 since a few weeks now. Both build fine, but we have a small test that fails to run for torchvision.
We build the following binary as a small test:
#include <assert.h>
#include <torch/script.h>
#include <torch/torch.h>
#include <torchvision/vision.h>
int main() {
auto& ops = torch::jit::getAllOperatorsFor(torch::jit::Symbol::fromQualString("torchvision::nms"));
assert(ops.size() == 1);
}
We are then building / linking with:
usr/bin/g++-11 -std=c++17 test.cpp -o test -fopenmp -I/home/linuxbrew/.linuxbrew/opt/pytorch/include -I/home/linuxbrew/.linuxbrew/opt/pytorch/include/torch/csrc/api/include -I/home/linuxbrew/.linuxbrew/Cellar/torchvision/0.16.0/include -L/home/linuxbrew/.linuxbrew/opt/pytorch/lib -Wl,-rpath,/home/linuxbrew/.linuxbrew/opt/pytorch/lib -ltorch -ltorch_cpu -lc10 -Wl,--whole-archive -L/home/linuxbrew/.linuxbrew/Cellar/torchvision/0.16.0/lib -ltorchvision
Unlikily the operators are not registered. I see various pull requests have been merged and there was some back and forth, but it is unclear for me from all that if there was a change in behaviour with the latest torchvision release. Is there anything special we need to do (register the operators manually? Use -Wl,--whole-archive ?
Any help would be appreciated so we can update the package in Homebrew. Let me know if it's ok to revive this old issue, or if I should open a separate issue?
You need --no-as-needed
Cool; thanks that worked! :)