sei-framework icon indicating copy to clipboard operation
sei-framework copied to clipboard

model training error

Open FarmOmics opened this issue 1 year ago • 1 comments

Command:

python -u ./selene/selene_sdk/cli.py train.yml --lr=0.1

Error information: Traceback (most recent call last): File "train.py", line 11, in parse_configs_and_run(configs, lr=0.01) File "/home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py", line 344, in parse_configs_and_run execute(operations, configs, current_run_output_dir) File "/home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py", line 188, in execute train_model.train_and_validate() File "/home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/selene_sdk/train_model.py", line 417, in train_and_validate self.train() File "/home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/selene_sdk/train_model.py", line 453, in train loss.backward() File "/home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Unable to find a valid cuDNN algorithm to run convolution (try_all at /opt/conda/conda-bld/pytorch_1591914855613/work/aten/src/ATen/native/cudnn/Conv.cpp:693) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x14c3d6230b5e in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: + 0xd5d68d (0x14c3d775d68d in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #2: + 0xd5e1d1 (0x14c3d775e1d1 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #3: + 0xd6220b (0x14c3d776220b in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #4: at::native::cudnn_convolution_backward_input(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0xb2 (0x14c3d7762762 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #5: + 0xdc9280 (0x14c3d77c9280 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #6: + 0xe0db18 (0x14c3d780db18 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #7: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x4fa (0x14c3d7763dfa in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #8: + 0xdc95ab (0x14c3d77c95ab in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #9: + 0xe0db74 (0x14c3d780db74 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #10: + 0x29dee26 (0x14c4043dee26 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #11: + 0x2a2e634 (0x14c40442e634 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #12: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x378 (0x14c403ff6ff8 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #13: + 0x2ae7df5 (0x14c4044e7df5 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #14: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x14c4044e50f3 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #15: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x14c4044e5ed2 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #16: torch::autograd::Engine::thread_init(int) + 0x39 (0x14c4044de549 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #17: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x14c407f0a638 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #18: + 0xd3e79 (0x14c41efd3e79 in /home/user/.conda/envs/selene_sdk/lib/python3.7/site-packages/matplotlib/../../../libstdc++.so.6) frame #19: + 0x94b43 (0x14c42c894b43 in /lib/x86_64-linux-gnu/libc.so.6) frame #20: + 0x126a00 (0x14c42c926a00 in /lib/x86_64-linux-gnu/libc.so.6)

FarmOmics avatar Jun 09 '23 02:06 FarmOmics