VAR
VAR copied to clipboard
The performance of VAR Tokenizer
What is the performance of VAR tokenizer? As it is trained on OpenImages while some other VQGAN tokenizers are trained on ImageNet only. I wonder the gain of performance brought by the pre-trained data.
hi @youngsheen, more VQVAE evals are coming in the next paper update.
We trained VQVAE on OpenImages refer to VQGAN (see https://github.com/CompVis/taming-transformers?tab=readme-ov-file#overview-of-pretrained-models).
We actually found training vqvae directly on ImageNet yields slightly better results than OpenImages. We kept using OpanImages to stay aligned with our VQGAN baseline.
Does the tokenizer able to do understanding?
I use vqvae in var, and the image produced by encoding and decoding is compared with the original image as follows.
Is this because the generalization performance of vqvae is not good enough?
@huxiaotaostasy please make sure you denormalize and clamp the output of VQVAE out
by out = out.mul(0.5).add_(0.5).clamp_(0, 1)
.
@luohao123 maybe you can create token maps (r1, r2, ..., rK) by repeating one index in [0, V-1] on all scales and then decode them to see how's the reconstructed image like.