PointNetGPD icon indicating copy to clipboard operation
PointNetGPD copied to clipboard

关于执行 Python main_1v.py --mode train报错问题

Open xiaopgaxiaopg opened this issue 3 years ago • 6 comments

环境:18.04 python3.7 CPU训练 num_worker=0 数据集:YCB数据集中的003类 生成了cloud pc_*
经过跟踪发现在执行def train(model, loader, epoch): ..... for batch_idx, (data, target) in enumerate(loader): ............. if len(data) <=1: continue .... 输出发现print(len(data))是1,进入if语句后,就开始报错,但是loader内有16800。

执行语句后输出的日志: (grasp) ly@ly:~/code/PointNetGPD$ python main_1v.py --mode train

/home/ly/anaconda3/envs/grasp/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) 16800 loader /home/ly/dataset/PointNetGPD/ycb_grasp/train/panda_003_cracker_box_1400.npy /home/ly/dataset/PointNetGPD/ycb_rgbd/003_cracker_box/google_512k/nontextured.ply Traceback (most recent call last): File "main_1v.py", line 387, in main() File "main_1v.py", line 365, in main acc_train = train(model, train_loader, epoch_i) File "main_1v.py", line 278, in train for batch_idx, (data, target) in enumerate(loader): File "/home/ly/anaconda3/envs/grasp/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/home/ly/anaconda3/envs/grasp/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/ly/anaconda3/envs/grasp/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch return self.collate_fn(data) File "main_1v.py", line 120, in my_collate return torch.utils.data.dataloader.default_collate(batch) File "/home/ly/anaconda3/envs/grasp/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 46, in default_collate elem = batch[0] IndexError: list index out of range

xiaopgaxiaopg avatar Jun 30 '22 07:06 xiaopgaxiaopg

我也遇到过这个问题,你可以看下ycb_rgbd和ycb_grasp里面的类别对上没,我把ycb_grasp的train里面的多余的一些删除了就没有这个问题。

xiaofeiso avatar Nov 21 '22 03:11 xiaofeiso

您好、我也遇到了这个问题、请问您解决了吗、又是怎么解决的呢?谢谢

nuo-code avatar Feb 21 '23 07:02 nuo-code

我只使用了003,004两个物体

nuo-code avatar Feb 21 '23 07:02 nuo-code

我认为是数据集在本身部分数据丢失了,所以default- collate中elem读不到数据,我在default- collate中开始加入了判别,如果传入的数据长度小于1直接返回

SK8Era avatar May 05 '23 08:05 SK8Era

但是当我训练main_1v_gpd.py时,无论我降低num_workers和batch_size都会报错,当num_workers不等于0数据为runtime error,等于0时为以放弃(核心已转储)。我的设备为i712700kf+3060 12g 内存32g,请问该怎么解决呢?谢谢

SK8Era avatar May 05 '23 09:05 SK8Era

但是当我训练main_1v_gpd.py时,无论我降低num_workers和batch_size都会报错,当num_workers不等于0数据为runtime error,等于0时为以放弃(核心已转储)。我的设备为i712700kf+3060 12g 内存32g,请问该怎么解决呢?谢谢 把num_workers注释掉

CaptainWuDaoKou avatar Dec 22 '23 01:12 CaptainWuDaoKou