EdgeNeXt
EdgeNeXt copied to clipboard
Is it take long to train this model?
Hi, I'm using EdgeNext as my Backbone for feature extraction for my image classification task, but I find it is very slow to converge (loss 35 vs loss 7 compare to efficientnetb0) so I'm not really sure if the model not really fits with my data or my config was wrong? Can anyone share some experience training this type of model? Thanks :D
Hi @leduy99,
Thank You for your interest in EdgeNeXt. Could you provide some more information about your experiment, such as dataset and detection network? At what input resolution you are training your detector? Also please make sure that the backbone weights for EdgeNeXt are loaded correctly.
Thanks
Hi, currently I'm using an exact implementation of edgenext_small, resolution of my input is (3, 112, 112), training on a set of 4 million images and starting with learning rate of 6e-3. It's still reducing the loss and accuracy still getting higher but I just feel it's slow, compare to some other pure CNN networks :D
Hi, currently I'm using an exact implementation of edgenext_small, resolution of my input is (3, 112, 112), training on a set of 4 million images and starting with learning rate of 6e-3. It's still reducing the loss and accuracy still getting higher but I just feel it's slow, compare to some other pure CNN networks :D
Thank you for providing the details,
We do notice in our detection experiments that EdgeNeXt is sensitive to LR and usually works well at slightly lesser LR as compared to pure ViTs. If you can afford, try tuning the LR for your training, let's say trying out some randomly sampled values around the current LR value. I hope this would be helpful.
Please let me know if you have any questions or get any useful insights out of your experiments. Thanks
Thanks, following your advice, I tried tunning LR a little lesser (1e-3) and use different Optimizer (AdamW instead of SGD like before) and the result is a lot better now :D Still wait to see the fully trained model performance and I will give some feedback later Anyway, thanks for this model. Very cool architecture :D