Add PatchMatchNet module for MVS and calculation of normals from depth
This work mainly integrated PatchMatchNet functionality in colamp using a TorchScript pre-trained module. Additionally it introduces functionality to calculate normal maps from depth maps since PatchMatchNet evaluation does not create normal maps as part of its process. More details about the changes:
- Colmap can compile with Torch support to enable PatchMatchNet. For this the pre-compiled LitTorch library needs to be downloaded from here https://pytorch.org/ on the desired configuration for GPU or CPU-only and the archive extracted under
<colmap-root>/lib/thus creating alibtorchsubfolder. Then CMake should be able to find the dependency and set the correct compilation flags. - PatchMatchNet can now be enabled from
patch_match_stereoby setting themvs_module_pathoption to a valid TorchScript module. One such module is included as part of this PR in<colmap-root>\mvs-modules\patchmatchnet-module.pt- The TorchScript interface is fairly generic using the following input structure:
(images: List[Tensor], intrisics: Tensor, extrinsics: Tensor, depth_params: Tensor)with the output being a tuple of(depth: Tensor, confidence: Tensor). Thus any module that subscribes to that input/output format for forward evaluation can be used instead.
- The TorchScript interface is fairly generic using the following input structure:
- Functionality of standard patch-match remains unchanged. There is now an inheritance structure used to select between standard and PMNet processing
- Normal maps are now not required for stereo fusion. If missing they will be calculated from the depth maps themselves. This is needed to accommodate PMNet processing that does not produce normal maps as part of the estimation work.
- Note that use of calculated normal maps can be forced even for standard patch-match processing through the use of a new stereo fusion option
--StereoFusion.calculate_normals.
- Note that use of calculated normal maps can be forced even for standard patch-match processing through the use of a new stereo fusion option
- Confidence maps can now be used for stereo fusion and they are optional. If missing a confidence of 1 is assumed everywhere. This is also added to make use of the confidence maps that are created as part of PMNet estimation.
- New method for finding related images for fusion based on triangulation scoring is introduced and can be enabled with the option
--StereoFusion.use_triangulation_scoring. This is included for parity with PatchMatchNet that has this method for finding related images instead of the colmap default. (useful for comparing results between colmap and Python)
I've been trying to compile this but I get the following error:
-- Caffe2: CUDA detected: 11.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 11.2
-- Found cuDNN: v8.1.0 (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Autodetected CUDA architecture(s): 6.1
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
-- Build type specified as Release
-- Enabling SIMD support
-- Enabling OpenMP support
-- Disabling interprocedural optimization
-- Autodetected CUDA architecture(s): 6.1
-- Enabling CUDA support (version: 11.2, archs: sm_61)
-- Enabling LibTorch support
-- Enabling OpenGL support
-- Disabling profiling support
-- Enabling CGAL support
-- Configuring done
CMake Error in src/CMakeLists.txt:
Imported target "torch" includes non-existent path
"MKL_INCLUDE_DIR-NOTFOUND"
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
* The path was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and references files it does not
provide.
Which libtorch/cuda version are you using? I've tried Cuda 11.2, cuDNN: v8.1.0, MKL 2020.04 and libtorch 1.7.1 on a 1080Ti. Same for Cuda 10.2 cuDNN: v7.
I've been trying to compile this but I get the following error:
-- Caffe2: CUDA detected: 11.2 -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda -- Caffe2: Header version is: 11.2 -- Found cuDNN: v8.1.0 (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so) -- Autodetected CUDA architecture(s): 6.1 -- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61 -- Build type specified as Release -- Enabling SIMD support -- Enabling OpenMP support -- Disabling interprocedural optimization -- Autodetected CUDA architecture(s): 6.1 -- Enabling CUDA support (version: 11.2, archs: sm_61) -- Enabling LibTorch support -- Enabling OpenGL support -- Disabling profiling support -- Enabling CGAL support -- Configuring done CMake Error in src/CMakeLists.txt: Imported target "torch" includes non-existent path "MKL_INCLUDE_DIR-NOTFOUND" in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include: * The path was deleted, renamed, or moved to another location. * An install or uninstall procedure did not complete successfully. * The installation package was faulty and references files it does not provide.Which libtorch/cuda version are you using? I've tried Cuda 11.2, cuDNN: v8.1.0, MKL 2020.04 and libtorch 1.7.1 on a 1080Ti. Same for Cuda 10.2 cuDNN: v7.
@Dawars It seems that LibTorch requires MKL as a dependency even though it already contains the headers and binaries in the LibTorch package itself. See if installing MKL on your system would resolve your issue.
On my end I made some modifications in the CMake configurations of LibTorch itself to make things work. I'll see if I can make changes in colmap CMake instead and have things work with vanilla LibTorch.
For reference here's a diff between my modified LibTorch and the vanilla one (LibTorch 1.7.1 for CUDA 10.1 with CUDNN 7.6.0)
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/ATen/Parallel.h" "b/lib\\libtorch/include/ATen/Parallel.h"
index 9e2f9be..cc652f2 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/ATen/Parallel.h"
+++ "b/lib\\libtorch/include/ATen/Parallel.h"
@@ -38,7 +38,7 @@ namespace internal {
// Initialise num_threads lazily at first parallel call
inline CAFFE2_API void lazy_init_num_threads() {
- thread_local bool init = false;
+ static thread_local bool init = false;
if (C10_UNLIKELY(!init)) {
at::init_num_threads();
init = true;
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/c10/util/StringUtil.h" "b/lib\\libtorch/include/c10/util/StringUtil.h"
index d2744f1..79da0ae 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/c10/util/StringUtil.h"
+++ "b/lib\\libtorch/include/c10/util/StringUtil.h"
@@ -74,7 +74,7 @@ struct _str_wrapper<const char*> final {
template<>
struct _str_wrapper<> final {
static const std::string& call() {
- thread_local const std::string empty_string_literal;
+ static thread_local const std::string empty_string_literal;
return empty_string_literal;
}
};
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/share/cmake/Caffe2/public/cuda.cmake" "b/lib\\libtorch/share/cmake/Caffe2/public/cuda.cmake"
index 8b60915..041e19b 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/share/cmake/Caffe2/public/cuda.cmake"
+++ "b/lib\\libtorch/share/cmake/Caffe2/public/cuda.cmake"
@@ -480,7 +480,7 @@ endforeach()
# Set C++14 support
set(CUDA_PROPAGATE_HOST_FLAGS_BLACKLIST "-Werror")
if(MSVC)
- list(APPEND CUDA_NVCC_FLAGS "--Werror" "cross-execution-space-call")
+ # list(APPEND CUDA_NVCC_FLAGS "--Werror" "cross-execution-space-call")
list(APPEND CUDA_NVCC_FLAGS "--no-host-device-move-forward")
else()
list(APPEND CUDA_NVCC_FLAGS "-std=c++14")
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/share/cmake/Caffe2/public/mkl.cmake" "b/lib\\libtorch/share/cmake/Caffe2/public/mkl.cmake"
index 9515a4a..c68074b 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/share/cmake/Caffe2/public/mkl.cmake"
+++ "b/lib\\libtorch/share/cmake/Caffe2/public/mkl.cmake"
@@ -1,4 +1,4 @@
-find_package(MKL QUIET)
+set(MKL_INCLUDE_DIR ${CMAKE_TORCHLIB_PATH}/include)
if(NOT TARGET caffe2::mkl)
add_library(caffe2::mkl INTERFACE IMPORTED)
@Dawars @ahojnnes I update colmap's cmake to set the MKL flags without needing the full dependency for LibTorch to build. Also, I removed an NVCC flag set by LibTorch that was causing issues with Eigen/Core.
However I'm not sure what to do with this part of the diff:
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/ATen/Parallel.h" "b/lib\\libtorch/include/ATen/Parallel.h"
index 9e2f9be..cc652f2 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/ATen/Parallel.h"
+++ "b/lib\\libtorch/include/ATen/Parallel.h"
@@ -38,7 +38,7 @@ namespace internal {
// Initialise num_threads lazily at first parallel call
inline CAFFE2_API void lazy_init_num_threads() {
- thread_local bool init = false;
+ static thread_local bool init = false;
if (C10_UNLIKELY(!init)) {
at::init_num_threads();
init = true;
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/c10/util/StringUtil.h" "b/lib\\libtorch/include/c10/util/StringUtil.h"
index d2744f1..79da0ae 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/c10/util/StringUtil.h"
+++ "b/lib\\libtorch/include/c10/util/StringUtil.h"
@@ -74,7 +74,7 @@ struct _str_wrapper<const char*> final {
template<>
struct _str_wrapper<> final {
static const std::string& call() {
- thread_local const std::string empty_string_literal;
+ static thread_local const std::string empty_string_literal;
return empty_string_literal;
}
};
I'm not sure if the issue with thread_local having to be static is specific to MSVC (windows) or if it happens on other platforms as well, since I have no good way to test this cross-platform.
I'll try it out on Ubuntu and let you know. One problem for me was that cmake found the mkl.cmake from cgal which was installed on my machine, maybe we need to supply a custom version with colmap.
On 2021. Mar 1., Mon at 20:07, Antonios Matakos [email protected] wrote:
@Dawars https://github.com/Dawars @ahojnnes https://github.com/ahojnnes I update colmap's cmake to set the MKL flags without needed the full dependency and also removed an NVCC flag set by LibTorch that was causing issues with Eigen/Core so now it should build without MKL being present.
However I'm not sure what to do with this part of the diff:
diff --git "a/c:\Users\anmatako\Downloads\libtorch/include/ATen/Parallel.h" "b/lib\libtorch/include/ATen/Parallel.h" index 9e2f9be..cc652f2 100644 --- "a/c:\Users\anmatako\Downloads\libtorch/include/ATen/Parallel.h" +++ "b/lib\libtorch/include/ATen/Parallel.h" @@ -38,7 +38,7 @@ namespace internal {
// Initialise num_threads lazily at first parallel call inline CAFFE2_API void lazy_init_num_threads() {
- thread_local bool init = false;
- static thread_local bool init = false; if (C10_UNLIKELY(!init)) { at::init_num_threads(); init = true; diff --git "a/c:\Users\anmatako\Downloads\libtorch/include/c10/util/StringUtil.h" "b/lib\libtorch/include/c10/util/StringUtil.h" index d2744f1..79da0ae 100644 --- "a/c:\Users\anmatako\Downloads\libtorch/include/c10/util/StringUtil.h" +++ "b/lib\libtorch/include/c10/util/StringUtil.h" @@ -74,7 +74,7 @@ struct _str_wrapper<const char*> final { template<> struct _str_wrapper<> final { static const std::string& call() {
- thread_local const std::string empty_string_literal;
- static thread_local const std::string empty_string_literal; return empty_string_literal; } };
I'm not sure if the issue with thread_local having to be static is specific to MSVC (windows) or if it happens on other platforms as well, since I have no good way to test this cross-platform.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/colmap/colmap/pull/1129#issuecomment-788195517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI35ZCP6JH4HYRP37YFET3TBPQYJANCNFSM4YBNICEQ .
Now it compiles and runs fine, no additional cmake config needed for mkl.
However the model file seems to be corrupted. I get the following error at: torch::jit::load(options_.mvs_module_path, kDevIn);
cache_size: 20
write_consistency_graph: 0
mvs_module_path: /home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module.pt
allow_missing_files: 0
First definition of patch-match module for thread index: 0
Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0))
Process finished with exit code 9
I checked it Python and Netron as well:
Error loading Python module. Unknown expression '=' in 'patchmatchnet-module3.pt'.
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.18.1
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] on linux
import torch
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module.pt') as f:
model = torch.load(f)
Traceback (most recent call last):
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-f97813dbac00>", line 2, in <module>
model = torch.load(f)
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 572, in load
if _is_zipfile(opened_file):
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 56, in _is_zipfile
byte = f.read(1)
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 72: invalid start byte
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module3.pt') as f:
model = torch.load(f)
Traceback (most recent call last):
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-a6ef56580e99>", line 2, in <module>
model = torch.load(f)
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 572, in load
if _is_zipfile(opened_file):
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 56, in _is_zipfile
byte = f.read(1)
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 72: invalid start byte
I can load the module just fine in C++ and Python 3.8.5 on Windows using torch.jit.load; even torch.load works as well with a warning like this:
...Python\Python38\site-packages\torch\serialization.py:589: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"
Wondering if there's some issue with committing the binary as part of the repo or an issue with Python version. See if it will run with a different python version. Also I can send you the module file directly so we can see if it's an issue caused when the file gets commited.
Now it compiles and runs fine, no additional cmake config needed for mkl.
However the model file seems to be corrupted. I get the following error at:
torch::jit::load(options_.mvs_module_path, kDevIn);cache_size: 20 write_consistency_graph: 0 mvs_module_path: /home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module.pt allow_missing_files: 0 First definition of patch-match module for thread index: 0 Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0)) Process finished with exit code 9I checked it Python and Netron as well:
Error loading Python module. Unknown expression '=' in 'patchmatchnet-module3.pt'.Python 3.7.9 (default, Aug 31 2020, 12:42:55) Type 'copyright', 'credits' or 'license' for more information IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help. PyDev console: using IPython 7.18.1 Python 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] on linux import torch with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module.pt') as f: model = torch.load(f) Traceback (most recent call last): File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-3-f97813dbac00>", line 2, in <module> model = torch.load(f) File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 572, in load if _is_zipfile(opened_file): File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 56, in _is_zipfile byte = f.read(1) File "/home/dawars/miniconda3/envs/historic/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 72: invalid start byte with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module3.pt') as f: model = torch.load(f) Traceback (most recent call last): File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-4-a6ef56580e99>", line 2, in <module> model = torch.load(f) File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 572, in load if _is_zipfile(opened_file): File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 56, in _is_zipfile byte = f.read(1) File "/home/dawars/miniconda3/envs/historic/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 72: invalid start byte
@Dawars one more thing you can try in case it's an issue with encodings between windows and Linux would be to pull PatchMatchNet from the tip of my branch here https://github.com/anmatako/PatchmatchNet
Then uncomment these 3 lines here: https://github.com/anmatako/PatchmatchNet/blob/e21992b1c2d028536403632eb1bf4bfb1aa8f176/eval.py#L97-L99
and you can run from within the root folder of PatchmatchNet as follows:
python eval.py --output_folder <your output folder> --checkpoint_path checkpoints/patchmatchnet-params.pt --input_type params --output_type depth
This will create a new TorchScript module named patchmatchnet-module.pt in your specified output folder. If you can load that module then it must be some conversion issue between OSes.
With PyTorch 1.7.1 I can read the model file properly. I think the problem is that libtorch tries to open the file as a text file, not binary, that was one of my problems with Python.
I tried explicitly setting the file mode via:
std::ifstream model_file(options_.mvs_module_path, std::ios::in | std::ios::binary);
model_[thread_index_] = torch::jit::load(model_file, kDevIn);
but I still get the same result.
Probably I'll have to compile a debug version of libtorch for linux to get more info. I have little experience with it but I'll try.
Here is the stack trace:
First definition of patch-match module for thread index: 0
Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0))
*** Aborted at 1614714901 (unix time) try "date -d @1614714901" if you are using GNU date ***
PC: @ 0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs()
*** SIGSEGV (@0x3e8000044a0) received by PID 17575 (TID 0x7f2b22fc4700) from PID 17568; stack trace: ***
@ 0x7f2b84b3a631 (unknown)
@ 0x7f2b8305f3c0 (unknown)
@ 0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs()
@ 0x7f2b751eeb53 std::__detail::_Executor<>::_M_dfs()
@ 0x7f2b751eec6c std::__detail::_Executor<>::_M_dfs()
@ 0x7f2b751ef412 std::__detail::__regex_algo_impl<>()
@ 0x7f2b319995fe c10::Device::Device()
@ 0x7f2b7544963d torch::jit::Unpickler::readInstruction()
@ 0x7f2b7544b540 torch::jit::Unpickler::run()
@ 0x7f2b7544baf1 torch::jit::Unpickler::parse_ivalue()
@ 0x7f2b753ef9c2 torch::jit::readArchiveAndTensors()
@ 0x7f2b753efcdd torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive()
@ 0x7f2b753f2605 torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize()
@ 0x7f2b753f2bd9 torch::jit::load()
@ 0x7f2b753f5455 torch::jit::load()
@ 0x55620f2f4c46 colmap::mvs::PatchMatchNet::InitModule()
@ 0x55620f2f43d6 colmap::mvs::PatchMatchNet::PatchMatchNet()
@ 0x55620ec7c9b0 colmap::mvs::PatchMatchController::ProcessProblem()
@ 0x55620ec8fb63 std::__invoke_impl<>()
@ 0x55620ec8fa50 std::__invoke<>()
@ 0x55620ec8f851 _ZNSt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS1_17PatchMatchOptionsEmEPS2_S3_mEE6__callIvJEJLm0ELm1ELm2EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
@ 0x55620ec8f367 std::_Bind<>::operator()<>()
@ 0x55620ec8efdd std::__invoke_impl<>()
@ 0x55620ec8ed55 std::__invoke<>()
@ 0x55620ec8ea7d _ZZNSt13__future_base11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS3_17PatchMatchOptionsEmEPS4_S5_mEESaIiEFvvEE6_M_runEvENKUlvE_clEv
@ 0x55620ec8f436 _ZNKSt13__future_base12_Task_setterISt10unique_ptrINS_7_ResultIvEENS_12_Result_base8_DeleterEEZNS_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSA_17PatchMatchOptionsEmEPSB_SC_mEESaIiEFvvEE6_M_runEvEUlvE_vEclEv
@ 0x55620ec8f08c _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSD_17PatchMatchOptionsEmEPSE_SF_mEESaIiEFvvEE6_M_runEvEUlvE_vEEE9_M_invokeERKSt9_Any_data
@ 0x55620eacd258 std::function<>::operator()()
@ 0x55620eacc75e std::__future_base::_State_baseV2::_M_do_set()
@ 0x55620ead4019 std::__invoke_impl<>()
@ 0x55620ead1136 std::__invoke<>()
@ 0x55620eacce3e _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENKUlvE_clEv
Signal: SIGSEGV (unknown crash reason)
Process finished with exit code 11
This is the error I got with PyTorch 1.6 might be related:
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module_windows.pt', 'br') as f:
model = torch.jit.load(f)
Traceback (most recent call last):
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-d6e3587a7e88>", line 2, in <module>
model = torch.jit.load(f)
File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/jit/__init__.py", line 277, in load
cpp_module = torch._C.import_ir_module_from_buffer(cu, f.read(), map_location, _extra_files)
RuntimeError:
Arguments for call are not valid.
The following variants are available:
aten::upsample_nearest1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> (Tensor(a!)):
Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor):
Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.
The original call is:
File "C:\Users\anmatako\AppData\Roaming\Python\Python38\site-packages\torch\nn\functional.py", line 3130
if input.dim() == 3 and mode == 'nearest':
return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
if input.dim() == 4 and mode == 'nearest':
return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
Serialized File "code/__torch__/torch/nn/functional/___torch_mangle_46.py", line 155
_49 = False
if _49:
_51 = torch.upsample_nearest1d(input, output_size3, scale_factors6)
~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_50 = _51
else:
'interpolate' is being compiled since it was called from 'FeatureNet.forward'
Serialized File "code/__torch__/models/net.py", line 139
def forward(self: __torch__.models.net.FeatureNet,
x: Tensor) -> List[Tensor]:
_35 = __torch__.torch.nn.functional.___torch_mangle_46.interpolate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_36 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None)
_37 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None)
Being able to load the module with Pytorch 1.7.1 at least means that the module does not seem to be corrupted. The Pytorch 1.6 issue you see looks like a simple incompatibility with older versions.
As for the error you get when you try to load with LibTorch, I'm quite confused as well, as it should not need any special configs in fstream and should load without issues. Can you try with LibTorch 1.7.1 for CUDA 10.1 and cudnn 7.6.0? That's the same package I'm using and I was wondering if there's something in these dependencies that makes the loading incompatible when doing it from colmap.
With PyTorch 1.7.1 I can read the model file properly. I think the problem is that libtorch tries to open the file as a text file, not binary, that was one of my problems with Python.
I tried explicitly setting the file mode via:
std::ifstream model_file(options_.mvs_module_path, std::ios::in | std::ios::binary); model_[thread_index_] = torch::jit::load(model_file, kDevIn);but I still get the same result.
Probably I'll have to compile a debug version of libtorch for linux to get more info. I have little experience with it but I'll try.
Here is the stack trace:
First definition of patch-match module for thread index: 0 Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0)) *** Aborted at 1614714901 (unix time) try "date -d @1614714901" if you are using GNU date *** PC: @ 0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs() *** SIGSEGV (@0x3e8000044a0) received by PID 17575 (TID 0x7f2b22fc4700) from PID 17568; stack trace: *** @ 0x7f2b84b3a631 (unknown) @ 0x7f2b8305f3c0 (unknown) @ 0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs() @ 0x7f2b751eeb53 std::__detail::_Executor<>::_M_dfs() @ 0x7f2b751eec6c std::__detail::_Executor<>::_M_dfs() @ 0x7f2b751ef412 std::__detail::__regex_algo_impl<>() @ 0x7f2b319995fe c10::Device::Device() @ 0x7f2b7544963d torch::jit::Unpickler::readInstruction() @ 0x7f2b7544b540 torch::jit::Unpickler::run() @ 0x7f2b7544baf1 torch::jit::Unpickler::parse_ivalue() @ 0x7f2b753ef9c2 torch::jit::readArchiveAndTensors() @ 0x7f2b753efcdd torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive() @ 0x7f2b753f2605 torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize() @ 0x7f2b753f2bd9 torch::jit::load() @ 0x7f2b753f5455 torch::jit::load() @ 0x55620f2f4c46 colmap::mvs::PatchMatchNet::InitModule() @ 0x55620f2f43d6 colmap::mvs::PatchMatchNet::PatchMatchNet() @ 0x55620ec7c9b0 colmap::mvs::PatchMatchController::ProcessProblem() @ 0x55620ec8fb63 std::__invoke_impl<>() @ 0x55620ec8fa50 std::__invoke<>() @ 0x55620ec8f851 _ZNSt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS1_17PatchMatchOptionsEmEPS2_S3_mEE6__callIvJEJLm0ELm1ELm2EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x55620ec8f367 std::_Bind<>::operator()<>() @ 0x55620ec8efdd std::__invoke_impl<>() @ 0x55620ec8ed55 std::__invoke<>() @ 0x55620ec8ea7d _ZZNSt13__future_base11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS3_17PatchMatchOptionsEmEPS4_S5_mEESaIiEFvvEE6_M_runEvENKUlvE_clEv @ 0x55620ec8f436 _ZNKSt13__future_base12_Task_setterISt10unique_ptrINS_7_ResultIvEENS_12_Result_base8_DeleterEEZNS_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSA_17PatchMatchOptionsEmEPSB_SC_mEESaIiEFvvEE6_M_runEvEUlvE_vEclEv @ 0x55620ec8f08c _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSD_17PatchMatchOptionsEmEPSE_SF_mEESaIiEFvvEE6_M_runEvEUlvE_vEEE9_M_invokeERKSt9_Any_data @ 0x55620eacd258 std::function<>::operator()() @ 0x55620eacc75e std::__future_base::_State_baseV2::_M_do_set() @ 0x55620ead4019 std::__invoke_impl<>() @ 0x55620ead1136 std::__invoke<>() @ 0x55620eacce3e _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENKUlvE_clEv Signal: SIGSEGV (unknown crash reason) Process finished with exit code 11This is the error I got with PyTorch 1.6 might be related:
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module_windows.pt', 'br') as f: model = torch.jit.load(f) Traceback (most recent call last): File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-4-d6e3587a7e88>", line 2, in <module> model = torch.jit.load(f) File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/jit/__init__.py", line 277, in load cpp_module = torch._C.import_ir_module_from_buffer(cu, f.read(), map_location, _extra_files) RuntimeError: Arguments for call are not valid. The following variants are available: aten::upsample_nearest1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> (Tensor(a!)): Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'. aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor): Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'. The original call is: File "C:\Users\anmatako\AppData\Roaming\Python\Python38\site-packages\torch\nn\functional.py", line 3130 if input.dim() == 3 and mode == 'nearest': return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE if input.dim() == 4 and mode == 'nearest': return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) Serialized File "code/__torch__/torch/nn/functional/___torch_mangle_46.py", line 155 _49 = False if _49: _51 = torch.upsample_nearest1d(input, output_size3, scale_factors6) ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _50 = _51 else: 'interpolate' is being compiled since it was called from 'FeatureNet.forward' Serialized File "code/__torch__/models/net.py", line 139 def forward(self: __torch__.models.net.FeatureNet, x: Tensor) -> List[Tensor]: _35 = __torch__.torch.nn.functional.___torch_mangle_46.interpolate ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _36 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None) _37 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None)
I compiled the debug version and it works, which is not very useful. Now I'm compiling in release mode.
I tried setting up Cuda 10.1 but on Ubuntu cublas is missing for this version, therefore I used 11.0. Not sure what to do from here. Also Pytorch 1.8 has just been released.
On 2021. Mar 2., Tue at 21:20, Antonios Matakos [email protected] wrote:
Being able to load the module with Pytorch 1.7.1 at least means that the module does not seem to be corrupted. The Pytorch 1.6 issue you see looks like a simple incompatibility with older versions.
As for the error you get when you try to load with LibTorch, I'm quite confused as well, as it should not need any special configs in fstream and should load without issues. Can you try with LibTorch 1.7.1 for CUDA 10.1 and cudnn 7.6.0? That's the same package I'm using and I was wondering if there's something in these dependencies that makes the loading incompatible when doing it from colmap.
With PyTorch 1.7.1 I can read the model file properly. I think the problem is that libtorch tries to open the file as a text file, not binary, that was one of my problems with Python.
I tried explicitly setting the file mode via:
std::ifstream model_file(options_.mvs_module_path, std::ios::in | std::ios::binary);
model_[thread_index_] = torch::jit::load(model_file, kDevIn);but I still get the same result.
Probably I'll have to compile a debug version of libtorch for linux to get more info. I have little experience with it but I'll try.
Here is the stack trace:
First definition of patch-match module for thread index: 0 Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0)) *** Aborted at 1614714901 (unix time) try "date -d @1614714901" if you are using GNU date *** PC: @ 0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs() *** SIGSEGV (@0x3e8000044a0) received by PID 17575 (TID 0x7f2b22fc4700) from PID 17568; stack trace: *** @ 0x7f2b84b3a631 (unknown) @ 0x7f2b8305f3c0 (unknown) @ 0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs() @ 0x7f2b751eeb53 std::__detail::_Executor<>::_M_dfs() @ 0x7f2b751eec6c std::__detail::_Executor<>::_M_dfs() @ 0x7f2b751ef412 std::__detail::__regex_algo_impl<>() @ 0x7f2b319995fe c10::Device::Device() @ 0x7f2b7544963d torch::jit::Unpickler::readInstruction() @ 0x7f2b7544b540 torch::jit::Unpickler::run() @ 0x7f2b7544baf1 torch::jit::Unpickler::parse_ivalue() @ 0x7f2b753ef9c2 torch::jit::readArchiveAndTensors() @ 0x7f2b753efcdd torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive() @ 0x7f2b753f2605 torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize() @ 0x7f2b753f2bd9 torch::jit::load() @ 0x7f2b753f5455 torch::jit::load() @ 0x55620f2f4c46 colmap::mvs::PatchMatchNet::InitModule() @ 0x55620f2f43d6 colmap::mvs::PatchMatchNet::PatchMatchNet() @ 0x55620ec7c9b0 colmap::mvs::PatchMatchController::ProcessProblem() @ 0x55620ec8fb63 std::__invoke_impl<>() @ 0x55620ec8fa50 std::__invoke<>() @ 0x55620ec8f851 _ZNSt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS1_17PatchMatchOptionsEmEPS2_S3_mEE6__callIvJEJLm0ELm1ELm2EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x55620ec8f367 std::_Bind<>::operator()<>() @ 0x55620ec8efdd std::__invoke_impl<>() @ 0x55620ec8ed55 std::__invoke<>() @ 0x55620ec8ea7d _ZZNSt13__future_base11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS3_17PatchMatchOptionsEmEPS4_S5_mEESaIiEFvvEE6_M_runEvENKUlvE_clEv @ 0x55620ec8f436 _ZNKSt13__future_base12_Task_setterISt10unique_ptrINS_7_ResultIvEENS_12_Result_base8_DeleterEEZNS_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSA_17PatchMatchOptionsEmEPSB_SC_mEESaIiEFvvEE6_M_runEvEUlvE_vEclEv @ 0x55620ec8f08c _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSD_17PatchMatchOptionsEmEPSE_SF_mEESaIiEFvvEE6_M_runEvEUlvE_vEEE9_M_invokeERKSt9_Any_data @ 0x55620eacd258 std::function<>::operator()() @ 0x55620eacc75e std::__future_base::_State_baseV2::_M_do_set() @ 0x55620ead4019 std::__invoke_impl<>() @ 0x55620ead1136 std::__invoke<>() @ 0x55620eacce3e _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENKUlvE_clEv Signal: SIGSEGV (unknown crash reason)
Process finished with exit code 11
This is the error I got with PyTorch 1.6 might be related:
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module_windows.pt', 'br') as f: model = torch.jit.load(f) Traceback (most recent call last): File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "
", line 2, in model = torch.jit.load(f) File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/jit/init.py", line 277, in load cpp_module = torch._C.import_ir_module_from_buffer(cu, f.read(), map_location, _extra_files) RuntimeError: Arguments for call are not valid. The following variants are available: aten::upsample_nearest1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> (Tensor(a!)): Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor): Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'. The original call is: File "C:\Users\anmatako\AppData\Roaming\Python\Python38\site-packages\torch\nn\functional.py", line 3130 if input.dim() == 3 and mode == 'nearest': return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE if input.dim() == 4 and mode == 'nearest': return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) Serialized File "code/torch/torch/nn/functional/___torch_mangle_46.py", line 155 _49 = False if _49: _51 = torch.upsample_nearest1d(input, output_size3, scale_factors6) ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _50 = _51 else: 'interpolate' is being compiled since it was called from 'FeatureNet.forward' Serialized File "code/torch/models/net.py", line 139 def forward(self: torch.models.net.FeatureNet, x: Tensor) -> List[Tensor]: _35 = torch.torch.nn.functional.___torch_mangle_46.interpolate ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _36 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None) _37 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/colmap/colmap/pull/1129#issuecomment-789186784, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI35ZCJAMSZSBXYTXHTDHDTBVCAHANCNFSM4YBNICEQ .
Upvote on the integration of 3rd party learning-based MVS methods.
With the recent popularity of colmap amongst the greater CV community, and the advancements in the learning-based SfM & MVS methods, it would be very beneficial for both sides to be able to incorporate methods such as PatchMatchNet, MVSNet, SuperPoint, SuperGlue, etc..