AdaBins
AdaBins copied to clipboard
Train: 0%
In the beginning,I noticed that I didn‘t have "pytorch3d",so I used "pip install pytorch3d",but it showed an error.then I used"pip unintall pytorch3d"and downloaded it from https://anaconda.org/pytorch3d/pytorch3d/files. But now, when training,it's always "Epoch: 1/25. Loop: Train: 0% 0/11579 [03:37<?, ?it/s]". I found the program stopped at this line:loss.backward().
What could be the problem?And I am using cuda 9.2 because my Driver version is outdated.looking forword to your help,thanks !
You can try unset the flag of args.distributed
.
Please refer to instructions provided here to install pytorch3d.
If you can't install pytorch3d for your driver version, you may also give a try to pytorch3d-nightly.
As @eugenelyj pointed out, try unsetting the distributed flag. You may get a better traceback.
Hello,I had the same problem. When I ran ‘python train.py args_train_nyu.txt’,The program stops here. Can you help me?
Hello,I had the same problem. When I ran ‘python train.py args_train_nyu.txt’,The program stops here. Can you help me?
excuse me,did you solve this problem?
excuse me,did you solve this problem?
Did you sovle the problem?
我好像是换了个服务器,用了4块显卡的服务器就没有报错了。
------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" @.>; 发送时间: 2023年3月24日(星期五) 上午8:40 @.>; @.@.>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53)
excuse me,did you solve this problem?
Did you sovle the problem?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我好像是换了个服务器,用了4块显卡的服务器就没有报错了。 … ------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" @.>; 发送时间: 2023年3月24日(星期五) 上午8:40 @.>; @.@.>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53) excuse me,did you solve this problem? Did you sovle the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***> 我这边用的自己的数据集,raw_image,和depth image。作者说的input.txt是指的哪个文件呢,后面857.47又代表啥意思呢? ` Traceback (most recent call last): File "train.py", line 403, in
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/root/autodl-tmp/UDepth-master/train.py", line 109, in main_worker
experiment_name=args.name, optimizer_state_dict=None)
File "/root/autodl-tmp/UDepth-master/train.py", line 178, in train
args) else enumerate(train_loader):
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
我好像是换了个服务器,用了4块显卡的服务器就没有报错了。 … ------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" @.>; 发送时间: 2023年3月24日(星期五) 上午8:40 @.>; @.@.>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53) excuse me,did you solve this problem? Did you sovle the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***> 不知道可否加微信咨询一下,我的是SemiMobile
我好像是换了个服务器,用了4块显卡的服务器就没有报错了。 … ------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" @.>; 发送时间: 2023年3月24日(星期五) 上午8:40 _@**._>; _@.@._>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53) excuse me,did you solve this problem? Did you sovle the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: _@_.*> 我这边用的自己的数据集,raw_image,和depth image。作者说的input.txt是指的哪个文件呢,后面857.47又代表啥意思呢? ` Traceback (most recent call last): File "train.py", line 403, in mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:
-- Process 1 terminated with the following error: Traceback (most recent call last): File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/root/autodl-tmp/UDepth-master/train.py", line 109, in main_worker experiment_name=args.name, optimizer_state_dict=None) File "/root/autodl-tmp/UDepth-master/train.py", line 178, in train args) else enumerate(train_loader): File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/autodl-tmp/UDepth-master/dataloader.py", line 87, in getitem focal = float(sample_path.split()[2]) IndexError: list index out of range `
focal的问题,你分割的文件里后面肯定没有标focal length in pixels,读不出来就报错了