TracKit icon indicating copy to clipboard operation
TracKit copied to clipboard

Ocean 训练问题

Open tm9161 opened this issue 3 years ago • 7 comments

您好,请教个问题,我按照教程运行python tracking/onekey.py(单独运行train_ocean.py 错误一样)的时候遇到下面报错,不知道是什么问题?

Traceback (most recent call last): File "./tracking/train_ocean.py", line 259, in main() File "./tracking/train_ocean.py", line 250, in main model, writer_dict = ocean_train(train_loader, model, optimizer, epoch + 1, curLR, config, writer_dict, logger, device=device) File "/data/code/siam/TracKit/tracking/../lib/core/function.py", line 54, in ocean_train loss.backward() File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward Variable._execution_engine.run_backward( File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, *args) # type: ignore File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 210, in wrapper outputs = fn(ctx, *args) File "/data/code/siam/TracKit/tracking/../lib/models/dcn/deform_conv.py", line 85, in backward deform_conv_cuda.deform_conv_backward_parameters_cuda( RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

tm9161 avatar Jun 07 '21 08:06 tm9161

您好,请教个问题,我按照教程运行python tracking/onekey.py(单独运行train_ocean.py 错误一样)的时候遇到下面报错,不知道是什么问题?

Traceback (most recent call last): File "./tracking/train_ocean.py", line 259, in main() File "./tracking/train_ocean.py", line 250, in main model, writer_dict = ocean_train(train_loader, model, optimizer, epoch + 1, curLR, config, writer_dict, logger, device=device) File "/data/code/siam/TracKit/tracking/../lib/core/function.py", line 54, in ocean_train loss.backward() File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward Variable._execution_engine.run_backward( File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, *args) # type: ignore File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 210, in wrapper outputs = fn(ctx, *args) File "/data/code/siam/TracKit/tracking/../lib/models/dcn/deform_conv.py", line 85, in backward deform_conv_cuda.deform_conv_backward_parameters_cuda( RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

看起来是你没有成功编译deform conv. 检查下环境和install.sh是不是一样。或者去掉align训练没有align的。

JudasDie avatar Jun 18 '21 02:06 JudasDie

您好,请教个问题,我按照教程运行python tracking/onekey.py(单独运行train_ocean.py 错误一样)的时候遇到下面报错,不知道是什么问题? Traceback (most recent call last): File "./tracking/train_ocean.py", line 259, in main() File "./tracking/train_ocean.py", line 250, in main model, writer_dict = ocean_train(train_loader, model, optimizer, epoch + 1, curLR, config, writer_dict, logger, device=device) File "/data/code/siam/TracKit/tracking/../lib/core/function.py", line 54, in ocean_train loss.backward() File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward Variable._execution_engine.run_backward( File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, *args) # type: ignore File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 210, in wrapper outputs = fn(ctx, *args) File "/data/code/siam/TracKit/tracking/../lib/models/dcn/deform_conv.py", line 85, in backward deform_conv_cuda.deform_conv_backward_parameters_cuda( RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

看起来是你没有成功编译deform conv. 检查下环境和install.sh是不是一样。或者去掉align训练没有align的。

我重新编译了一下deform_conv 还是不行,看了一下install.sh里的配置,因为我用的30的卡,cuda是11.1,torch1.8.1,还有一个是mpi4py这个没安装成功,其他都按照install里安的,不知道有没有关系。

我自己再看看吧,谢谢回复。

tm9161 avatar Jun 21 '21 15:06 tm9161

@tm9161 您好 请问你解决这个问题了么,我也遇到同样的问题了

Jee-King avatar Aug 21 '21 08:08 Jee-King

@JudasDie 您好,把align参数设置成False,就不会出现这个问题了。请问 这个参数会对性能有较大的影响吗?

Jee-King avatar Aug 21 '21 09:08 Jee-King

@tm9161 您好 请问你解决这个问题了么,我也遇到同样的问题了

没,我也是设置了False。

tm9161 avatar Aug 23 '21 07:08 tm9161

@tm9161 你好,我是3080ti + cudatoolkit11.1 + torch1.8 ,在python setup.py develop这一步编译就报错了,感觉是cuda版本太高的问题,请问你遇到这个问题了吗?怎么解决的?

l-sf avatar Jul 12 '22 12:07 l-sf

Please refer to the new repo. of JudasDie/SOTS. Thx.

l-sf @.***> 于2022年7月12日周二 20:37写道:

@tm9161 https://github.com/tm9161 你好,我是3080ti + cudatoolkit11.1 + torch1.8 ,在python setup.py develop这一步编译就报错了,感觉是cuda版本太高的问题,请问你遇到这个问题了吗?怎么解决的?

— Reply to this email directly, view it on GitHub https://github.com/researchmm/TracKit/issues/81#issuecomment-1181708701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6U2PDIDVBCFSXBLHWR7VTVTVRKFANCNFSM46HFC5YA . You are receiving this because you were mentioned.Message ID: @.***>

-- From: Zhang Zhipeng Institution: National Laboratory of Pattern Recognition Address: 95 Zhongguancun East Road, 100190, BEIJING, CHINA Email: @.***

Best Wishes

JudasDie avatar Jul 12 '22 12:07 JudasDie