LiDAR-MOS icon indicating copy to clipboard operation
LiDAR-MOS copied to clipboard

Training setups (tested with different GPUs)

Open emilyemliyM opened this issue 2 years ago • 6 comments

Dear author,

Thanks for the sharing code.

I'm trying to reproduce the metrics from the paper, but haven't been successful yet. I would like to ask about some training parameters and hardware equipment for the experiment? Regarding the indicators such as iou in the paper, do you mean miou or just the iou of the moving class?

Thanks!

emilyemliyM avatar Mar 18 '22 02:03 emilyemliyM

Hey @mengshiyu0109, the training parameters used for the paper are as default. We tested on Quard4000, 5000, 6000, RTX2080ti, and TITAN and got similar results.

IoU reported in our paper is the one for moving objects only.

Note that the 62 IoU performance was got by adding KNN and semantics. Without semantics, the performance is around 58 IoU on the test set. You may first check whether you enable the KNN in the config file or not.

@MaxChanger could you please also share your setups of training LMNet here?

Chen-Xieyuanli avatar Mar 18 '22 08:03 Chen-Xieyuanli

Yeah, Hi @mengshiyu0109, I have trained and tested LMNet on 3*2080Ti and 3090, and can generally achieve similar accuracy as reported in the paper. Maybe, I think you can try to set the batch_size in salsanext_mos.yml to 24, and then use 3*2080Ti or more GPU cards with slightly smaller memory (guarantee that bs=24).

In addition, the IoU in the paper should refer specifically to MovingIoU, but saving checkpoints during training is based on mean_IoU (average static and moving).

By the way, there may be non-deterministic in this code, you can set the following flags

def set_seed(seed=1024):
    random.seed(seed)
    # os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed) # if you are using multi-GPU.
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

MaxChanger avatar Mar 18 '22 12:03 MaxChanger

Yeah, Hi @mengshiyu0109, I have trained and tested LMNet on 32080Ti and 3090, and can generally achieve similar accuracy as reported in the paper. Maybe, I think you can try to set the batch_size in salsanext_mos.yml to 24, and then use 32080Ti or more GPU cards with slightly smaller memory (guarantee that bs=24).

In addition, the IoU in the paper should refer specifically to MovingIoU, but saving checkpoints during training is based on mean_IoU (average static and moving).

By the way, there may be non-deterministic in this code, you can set the following flags

def set_seed(seed=1024):
    random.seed(seed)
    # os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed) # if you are using multi-GPU.
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

really thanks, I still in the training mode, BTW, I just focus on the moving class iou,however, it just about 20% I got, So I haven't try the test part,

According the reply, I will try again, now. I have more confidence about the topic now, since I have tried several method But I can not got the beautiful metrics about moving class.

Thanks.

emilyemliyM avatar Mar 18 '22 14:03 emilyemliyM

@MaxChanger Thanks for the report!

@mengshiyu0109 you may first check whether you can generate similar results with our pre-trained model to see whether the setup is correct or not.

Chen-Xieyuanli avatar Mar 18 '22 14:03 Chen-Xieyuanli

@MaxChanger Thanks for the report!

@mengshiyu0109 you may first check whether you can generate similar results with our pre-trained model to see whether the setup is correct or not.

thanks!! Thanks a lot for your reply. I would like to ask, during the training process, what is the value of miou you obtained during training? Then go to start the test.

emilyemliyM avatar Mar 19 '22 14:03 emilyemliyM

Hi, @mengshiyu0109. During my training, best_val_iou in tensorboard should be around 0.84 in epoch ~120 (or I guess greater than 0.82 should be fine). Also, the non-deterministic may cause some fluctuations. After this, you can use python infer.py xxxx to generate predicted labels and use python utils/evaluate_mos.py xxx to evaluate. The moving IoU in valid set should be around 0.60 (0.59~0.618).

MaxChanger avatar Mar 19 '22 15:03 MaxChanger