Tianhong Li comments

Results 65 comments of


                                            Tianhong Li

Question on image Inpainting

> For Figure 1 in the paper, as mentioned in the caption, "the mask for MAGE is on semantic tokens whereas that of MAE is on patches in the input...

Question on image Inpainting

The mask of MAGE is always on tokens -- even when the original masking is on the pixels (as in the inpainting scenario), we need to transfer it into masking...

Question on Masking ratio

Thanks for your interest! The masking ratio is left truncated by 0.5 so that we can always drop 50% of the input tokens in the ViT encoder, which largely saves...

Question on Masking ratio

Our evaluation protocol is based on both FID and linear probing accuracy -- once we train a model with certain hyper-parameters, we evaluate it on ImageNet and pick the best...

Question on Masking ratio

We tried using CLS token. However, the performance is not very stable -- normally it achieves similar performance as average pooled features, but occasionally it gets very poor accuracy (~10%)....

How sensitive is this model to different batch size?

The smallest batch size I tested is 1024, which gives a similar performance. Since we have a learning rate scaling w.r.t. the batch size, I guess the performance will not...

Query About Figure 3 in the MAGE Article

The implementation detail is as follows: during training, a masking ratio (`mr`) between 0.5-1 is sampled for each iteration to mask out the input image tokens. Since `mr` is always...

A question about vocab_size in token embbeding

Hi, thanks for your interest! Yes, the vocab_size should be self.codebook_size + 1 when there is no class condition. We set it to self.codebook_size + 1000 + 1 just because...

A question about vocab_size in token embbeding

You can actually set it to any value larger than or equal to 1024, and smaller than 1024+1000+1 -- but the pre-trained model set it to 1100 (again, a legacy...

A question about vocab_size in token embbeding

Unfortunately I don't have access to the original JAX code now, so there is no plan to release contrastive training part. However, that part is quite straight-forward if you want...