OpenSDI icon indicating copy to clipboard operation
OpenSDI copied to clipboard

Question about the training performance

Open Jenna-Bai opened this issue 6 months ago • 4 comments

Thanks so much for the release of code and dataset.

I tried to retrain MVSSNet and Trufor on OpenSDI dataset under the IMDLBenco framework. After 3 epochs, the test_pixel-level Accuracy increases a lot, but F1 and IoU drops to a lower value.

Is there anything I need to change in train.py? I noticed that you report IoU and F1 in the paper, should I change the validation metrics?

Jenna-Bai avatar Jun 18 '25 13:06 Jenna-Bai

That's an excellent observation, and it is a classical pixel-level label imbalance problem. Since real pixels vastly outnumber manipulated ones in this problem, a model can achieve high accuracy early in training by simply predicting all as 'real'. F1 and IoU, on the other hand, are crucial because they evaluate the model's ability to find the fake pixels, which is our actual goal. What you're seeing is normal for the first few epochs; we should trust the F1/IoU scores as they are true performance indicators

iamwangyabin avatar Jun 19 '25 06:06 iamwangyabin

Could you please share about how many hours do you train these models?

I trained MVSSNet and SparseViT for a day using IMDLBenCo, but the performance does't improve and there is no difference between intra-dataset and cross-dataset.

I posted the performance below, could your please give me some advice?

Image

Image

Jenna-Bai avatar Jun 23 '25 15:06 Jenna-Bai

Very, very slow. I remembered I trained Trufor for 2 days on a 4*H100 cluster. Your F1 on SD1.5 is a bit low. I suggest you check whether the train loss is still decreasing.

iamwangyabin avatar Jun 24 '25 05:06 iamwangyabin

Very, very slow. I remembered I trained Trufor for 2 days on a 4*H100 cluster. Your F1 on SD1.5 is a bit low. I suggest you check whether the train loss is still decreasing.

Thanks so much for your reply! I tried the training code for MaskCLIP, but all its losses suddenly turn to nan at epoch 8. Have you encountered such a problem and have any idea how to fix it?

Jenna-Bai avatar Jul 02 '25 13:07 Jenna-Bai