LMDrive icon indicating copy to clipboard operation
LMDrive copied to clipboard

train_pretrain遇到问题

Open CoderXuans opened this issue 11 months ago • 1 comments

@deepcs233 你好,我在一台GPU上运行train.sh之后, 1、报File "train_pretrain.py", line 821, in call prob_det = torch.sigmoid(output[:, :, 0] * (1 - 2 * target[:, :, 0])) IndexError: too many indices for tensor of dimension 2这个问题 我把prob_det = torch.sigmoid(output[:, :, 0] * (1 - 2 * target[:, :, 0]))更改为 prob_det = torch.sigmoid(output[:, 0] * (1 - 2 * target[:, :, 0])) 再运行train.sh后遇到新的错误: 2、Loading image: /home/syl/LMDrive/autodl-tmp/DATASET_ROOT/data/routes_town02_long_w16_08_15_21_31_40/rgb_full/5235.jpg Loading image: /home/syl/LMDrive/autodl-tmp/DATASET_ROOT/data/routes_town02_long_w11_08_15_15_54_43/rgb_full/5449.jpg Traceback (most recent call last): File "train_pretrain.py", line 1873, in main() File "train_pretrain.py", line 1273, in main train_metrics = train_one_epoch( File "train_pretrain.py", line 1413, in train_one_epoch loss_traffic, loss_velocity = loss_fns["traffic"](output[0], target[4]) File "train_pretrain.py", line 821, in call prob_det = torch.sigmoid(output[:, 0] * (1 - 2 * target[:, :, 0])) RuntimeError: The size of tensor a (106) must match the size of tensor b (2500) at non-singleton dimension 1 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 497603) of binary: /home/mepus/anaconda3/envs/lmdrive_pyc2.0.1_py3.8/bin/python3 Traceback (most recent call last): 问题1的修改是正确的吗?问题2有什么解决办法吗?期待您的快速回复

CoderXuans avatar Jan 09 '25 03:01 CoderXuans

在云服务器上训练就没事,在自己的本地电脑训练就出问题..

CoderXuans avatar Jan 10 '25 01:01 CoderXuans