pytorch-revgrad icon indicating copy to clipboard operation
pytorch-revgrad copied to clipboard

Tips on how to use RevGrad

Open RSKothari opened this issue 3 years ago • 6 comments

This question is for users experienced with RevGrad. What is recommend approach to using RevGrad?

  1. Do we first train without RevGrad and then fine-tune with it?
  2. Do we ramp-up the Lambda parameter? Leave it as a learnable parameter?

RSKothari avatar May 07 '21 15:05 RSKothari

This is a really good question!

I actually don't see whether lambda (I assume you mean the parameter called alpha in this library) can be learned? It's not implemented as a learnable parameter here, and it only applies during the backward pass to scale the gradients. This doesn't feed into any objective, so I'm not sure it can be optimised directly - only through some sort of second-order optimisation. Please correct me if I'm wrong though.

I have personally not done a huge amount of experimentation with this, and usually keep it at a low constant. I can imagine increasing it later on during training may work, but I also wonder if it naturally increases in relative weight of the gradients as the main objective is learned better (and the gradients reduce).

Would be cool to do some experiments with this. What are you attempting to do?

janfreyberg avatar May 08 '21 08:05 janfreyberg

@janfreyberg Sorry for the late response. I am attempting to produce a network which is invariant to domain identity. I think Alpha can be backprop'ed if it is set to be a learnable parameter. Although I am not sure if that is a good idea. I can imagine the network completely corrupting the feature space. I find that RevGrad tends to destroy the primary objective if the weight of the loss is not properly balanced. I intend on exploring the effect with a ramp up.

RSKothari avatar May 21 '21 11:05 RSKothari

Cool. I'm still unclear if backpropping the parameter will be possible, but maybe you could try it? If you fork the repo and set it to be a learnable parameter, let me know how it goes and I can merge it if it works well.

And yes, fully agree that the balance has to be done pretty carefully because it can have a detrimental effect on performance. It would be great to see how the performance on the domain label changes, too, as you'd ideally want it to be at chance by the time you finish training, so some experiments on charting primary metrics, domain metrics, and gradient scaling over time would be very interesting.

janfreyberg avatar May 24 '21 07:05 janfreyberg

https://github.com/janfreyberg/pytorch-revgrad/issues/6#issuecomment-845889281 Hello, I've been using GRL recently. If I set lambda larger, the main task loss will increase greatly. If I set lambda smaller, the test results will not change much (or even decrease). So, how to balance loss, can you give me some suggestions? thank you

dingtao1 avatar Nov 12 '21 08:11 dingtao1

Hi @dingtao1,

my recommendation would be to sweep over the parameter, between 0 and 1, and to track the accuracy (or MSE, or whatever metric is appropriate) for the label you are using RevGrad for, and your main label of interest. Ideally, you'd find a point somewhere that reduces the accuracy on the RevGrad label, while keeping the performance on your main label as high (or nearly as high) as if you had no RevGrad at all (this should be your baseline).

Lastly, one recommendation is to scale the losses so they roughly match. For example, if you are using RevGrad on a regression target, and your main target is classification, I would look at the losses of both and roughly scale the losses to match. This is independent of the lambda parameter in RevGrad.

Hope that helps

janfreyberg avatar Nov 13 '21 16:11 janfreyberg

Hi, I have a question related to training the model for domain adaptation, are both the loss functions trained simultaneously or I train task loss first on source domain and then again train the model on domain tasks?

ammarlam10 avatar Nov 21 '22 14:11 ammarlam10