PoinTr icon indicating copy to clipboard operation
PoinTr copied to clipboard

训练自己数据集

Open 8umpk1n opened this issue 2 years ago • 7 comments

你好,我是一个刚接触点云的小白,没有找到太多训练方面的资料。我在训练自己数据集时,将ShapeNet-55文件中的点云文件和txt文件都换成了我自己的数据文件,但训练时出现以下问题,想请教以下自己点云数据应该如何规范化与ShapeNet-55中的数据对齐呢 qwq [DATASET] Open file data/ShapeNet55-34/ShapeNet-55/train.txt [DATASET] 4 instances were loaded [DATASET] Open file data/ShapeNet55-34/ShapeNet-55/test.txt [DATASET] 10518 instances were loaded 2022-11-01 20:36:45,313 - MODEL - INFO - Transformer with knn_layer 1 2022-11-01 20:36:47,342 - PoinTr - INFO - Using Data parallel ... /home/a/anaconda3/envs/10.1/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). " /home/a/anaconda3/envs/10.1/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:154: UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose. warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning) Traceback (most recent call last): File "main.py", line 68, in main() File "main.py", line 64, in main run_net(args, config, train_writer, val_writer) File "/home/a/PoinTr/tools/runner.py", line 142, in run_net train_writer.add_scalar('Loss/Epoch/Sparse', losses.avg(0), epoch) File "/home/a/PoinTr/utils/AverageMeter.py", line 42, in avg return self._sum[idx] / self._count[idx] ZeroDivisionError: division by zero

8umpk1n avatar Nov 01 '22 12:11 8umpk1n

这个问题看起来并不是数据的问题吧,感觉是你的averagemeter没有计数,导致count是0,所以出的是报不能除0的错误。

  • 你自己跳过了training?
  • bs > 4了,导致drop last把training阶段跳过了吧?

yuxumin avatar Nov 01 '22 12:11 yuxumin

太感谢了!是的,确实是这样,我把bs改成4,解决了这个问题,但训练过程出现了这个问题,它生成了pth文件,但是在接下来的epoch中出现了以下问题,请问这是怎么会事呢?怎么解决或者说对我训练结果有什么影响呢 qaq

2022-11-02 19:13:39,163 - PoinTr - INFO - Using Data parallel ... 2022-11-02 19:13:40,184 - PoinTr - INFO - [Epoch 0/2][Batch 1/1] BatchTime = 1.013 (s) DataTime = 0.097 (s) Losses = ['106.8453', '200.6035'] lr = 0.000500 /home/a/anaconda3/envs/10.1/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:154: UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose. warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning) 2022-11-02 19:13:40,214 - PoinTr - INFO - [Training] EPOCH: 0 EpochTime = 1.043 (s) Losses = ['106.8453', '200.6035'] 2022-11-02 19:13:47,207 - PoinTr - INFO - Save checkpoint at ./experiments/PoinTr/ShapeNet55_models/e/ckpt-last.pth 2022-11-02 19:13:47,612 - PoinTr - INFO - Save checkpoint at ./experiments/PoinTr/ShapeNet55_models/e/ckpt-epoch-000.pth 2022-11-02 19:13:47,925 - PoinTr - INFO - [Epoch 1/2][Batch 1/1] BatchTime = 0.312 (s) DataTime = 0.100 (s) Losses = ['115.6831', '206.9640'] lr = 0.000500 2022-11-02 19:13:47,961 - PoinTr - INFO - [Training] EPOCH: 1 EpochTime = 0.347 (s) Losses = ['115.6831', '206.9640'] 2022-11-02 19:13:47,961 - PoinTr - INFO - [VALIDATION] Start validating epoch 1 Traceback (most recent call last): File "main.py", line 68, in main() File "main.py", line 64, in main run_net(args, config, train_writer, val_writer) File "/home/a/PoinTr/tools/runner.py", line 149, in run_net metrics = validate(base_model, test_dataloader, epoch, ChamferDisL1, ChamferDisL2, val_writer, args, config, logger=logger) File "/home/a/PoinTr/tools/runner.py", line 217, in validate input_pc = misc.get_ptcloud_img(input_pc) File "/home/a/PoinTr/utils/misc.py", line 190, in get_ptcloud_img ax = fig.gca(projection=Axes3D.name, adjustable='box') TypeError: gca() got an unexpected keyword argument 'projection'

8umpk1n avatar Nov 02 '22 11:11 8umpk1n

更新了misc.py,应该可以了

yuxumin avatar Nov 02 '22 12:11 yuxumin

可能就是我的数据集有问题,我用obj文件转化成的txt又转化成的npy,训练报错如下,但我用原始数据时却没问题,不知道我数据集制作过程中错误出现在那里。 2022-11-03 11:05:22,927 - PoinTr - INFO - [VALIDATION] Start validating epoch 1 2022-11-03 11:05:24,236 - PoinTr - INFO - [Validation] EPOCH: 1 Metrics = ['0.2723', '1687.6722', '14483.1594'] 2022-11-03 11:05:24,236 - PoinTr - INFO - ============================ TEST RESULTS ============================ 2022-11-03 11:05:24,236 - PoinTr - INFO - Taxonomy #Sample F-Score CDL1 CDL2 #ModelName Traceback (most recent call last): File "main.py", line 68, in main() File "main.py", line 64, in main run_net(args, config, train_writer, val_writer) File "/home/a/PoinTr/tools/runner.py", line 149, in run_net metrics = validate(base_model, test_dataloader, epoch, ChamferDisL1, ChamferDisL2, val_writer, args, config, logger=logger) File "/home/a/PoinTr/tools/runner.py", line 260, in validate msg += shapenet_dict[taxonomy_id] + '\t' KeyError: '1'

8umpk1n avatar Nov 03 '22 03:11 8umpk1n

我想简单的训练一个模型,然后点云补全可视化,我在尝试一下PNC模型。 或者说这些点云数据在制作时不需要什么统一处理吧!谢谢!

8umpk1n avatar Nov 03 '22 03:11 8umpk1n

我想简单的训练一个模型,然后点云补全可视化,我在尝试一下PNC模型。 或者说这些点云数据在制作时不需要什么统一处理吧!谢谢!

请问你用自己的数据集对应ShapeNet-55时是怎么统一点数的,是修改了网络还是把自己数据集的点数进行了限制?

Zhengchao97201 avatar Jul 27 '23 01:07 Zhengchao97201

我想简单的训练一个模型,然后点云补全可视化,我在尝试一下PNC模型。 或者说这些点云数据在制作时不需要什么统一处理吧!谢谢!

请问你用自己的数据集对应ShapeNet-55时是怎么统一点数的,是修改了网络还是把自己数据集的点数进行了限制?

下采样到一致点数即可,我记得代码中也有现成的FPS操作,可以去看一下

Rogerlv51 avatar Aug 01 '23 03:08 Rogerlv51