Question about input normalization
Congratulations on the great work showcased with Dformer! I have a few question related to fine-tuning your models with a new dataset. Which kind of normalization should I apply to my RGB images and depth maps before feeding them into the Dformer encoder? For RGB I assume a standard 0-1 normalization per image (i.e. dividing by 255.) but for depth maps I don't see an obvious strategy. Furthermore, I would probably obtain very bad results if your model was trained using a different normalization strategy. Thus, I am looking forward to you elucidating me. Thanks a lot! Best regards,
Lorenzo Mazza
Thanks for your attention to our work!
If you are finetuning our trained weight with a new dataset, we recommend to keep consistent with our pretraining(code):
transforms.Normalize( mean=torch.tensor(mean), std=torch.tensor(std))
Here the mean and std are IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406,0.48)、
IMAGENET_DEFAULT_STD = (0.229, 0.224, 0.225,0.28). The first three constants are for RGB while the last one is for depth.
I hope this can help you. If it not works, feel free to contact us for further discussion.
Best regards, Bowen Yin
Hi Yin, Thank you so much for the prompt answer and for the elucidation! Your answer was super helpful to me. I will follow up if I have additional questions. Best,
Lorenzo