mup icon indicating copy to clipboard operation
mup copied to clipboard

Does mup work with model with Conv2D as output?

Open BurguerJohn opened this issue 2 years ago • 5 comments

Hello, this project look great and the Github documentation is really good. Just wondering if mup would work with a model that have the last layer as nn.Conv2d instead of linear.

BurguerJohn avatar Mar 09 '22 16:03 BurguerJohn

Hi BurguerJohn,

We haven't implemented a mu-version of Conv2d to use as the output layer, but we can certainly do it! It seems slightly unusual to us to use Conv2d as the output layer. Could you tell us more about your model?

edwardjhu avatar Mar 09 '22 18:03 edwardjhu

Its more of a curiosity test, I would like to see how it would perform in a Unet model.

BurguerJohn avatar Mar 09 '22 19:03 BurguerJohn

In our case, we use ConvTranspose2d (1d, 3d) as output, but basically that should behave like Conv2d and Linear.

May I ask about the progress of the muconv2d branch?

tivek avatar Apr 22 '22 22:04 tivek

Hi both,

Thanks for your patience regarding this issue. The muconv2d branch should work in principle, but I haven't added test cases since it requires the labels to be the output of a conv layer. If there is interest, we'd love to invite you to give it a try in your code and see if you could reproduce the coordinate check plots in README. We are happy to help debug if you run into any issues!

edwardjhu avatar May 01 '22 20:05 edwardjhu

Hi @edwardjhu, thanks for the kind reply!

My team certainly plans to make coordinate check plots of our models with ConvTranspose output layers. At this point we are working around lr schedulers which are not compatible with mup's optimizers. When we are ready, I am going to post the results here.

tivek avatar May 10 '22 08:05 tivek

Closing this issue for now, but feel free to re-open when there are new updates.

thegregyang avatar Dec 05 '22 16:12 thegregyang

A belated and short update, the muconv2d branch is working fine for us. If desirable, I can whip up a coord_check plot for a toy model with MuOutConvTranspose1d.

tivek avatar Feb 25 '24 19:02 tivek

sure!

edwardjhu avatar Feb 25 '24 22:02 edwardjhu