CAE icon indicating copy to clipboard operation
CAE copied to clipboard

Request for clarification on implementation

Open lorenzbaraldi opened this issue 2 years ago • 5 comments

Hi, after reading your paper and studying the code, I don't understand why VisionTransformerForMaskedImageModeling have two implementations of the encoder (respectively encoder and teacher model). Why is it not possible to use just the encoder (since it seems they both have the same parameters)?

lorenzbaraldi avatar Jan 11 '23 15:01 lorenzbaraldi

Hi, it is ok to use just the encoder. The so-called teacher model is actually the same as the encoder (structure/parameter).

SelfSup-MIM avatar Jan 11 '23 15:01 SelfSup-MIM

Even for pre-training purposes? I'm trying to pre-train the model from scratch and I don't understand the utility of the teacher model. Why is not possible to do with torch.no_grad(): latent_target = self.encoder(x, bool_masked_pos=(~bool_masked_pos)) instead of with torch.no_grad(): latent_target = self.teacher(x, bool_masked_pos=(~bool_masked_pos))

lorenzbaraldi avatar Jan 11 '23 15:01 lorenzbaraldi

It is ok to do so. This codebase defines a teacher model class to provide other researchers with the possibility to use EMA, although EMA is not used in CAE.

SelfSup-MIM avatar Jan 11 '23 15:01 SelfSup-MIM

Ok thank you for the explanation

lorenzbaraldi avatar Jan 11 '23 15:01 lorenzbaraldi

Hi, Sorry to bring up a one year old issue. However, I am unsure about an implementation detail.

https://github.com/lxtGH/CAE/blob/d72597143e486d9bbbaf1e3adc4fd9cfa618633a/models/modeling_cae.py#L114C58-L114C58

Here you update the teacher after computing the latent targets. Doesn't this mean that the target computation uses the teacher paramters of the previous step? Therefore the teacher and the encoder parameters are not synced. Is that on purpose?

treasan avatar Jan 05 '24 20:01 treasan