VAR The performance of VAR Tokenizer

The performance of VAR Tokenizer

Open youngsheen opened this issue 10 months ago • 5 comments

What is the performance of VAR tokenizer? As it is trained on OpenImages while some other VQGAN tokenizers are trained on ImageNet only. I wonder the gain of performance brought by the pre-trained data.

Apr 09 '24 18:04 youngsheen

hi @youngsheen, more VQVAE evals are coming in the next paper update.

We trained VQVAE on OpenImages refer to VQGAN (see https://github.com/CompVis/taming-transformers?tab=readme-ov-file#overview-of-pretrained-models).

We actually found training vqvae directly on ImageNet yields slightly better results than OpenImages. We kept using OpanImages to stay aligned with our VQGAN baseline.

Apr 09 '24 19:04 keyu-tian

Does the tokenizer able to do understanding?

Apr 11 '24 07:04 luohao123

I use vqvae in var, and the image produced by encoding and decoding is compared with the original image as follows. Is this because the generalization performance of vqvae is not good enough?

Apr 11 '24 09:04 huxiaotaostasy

@huxiaotaostasy please make sure you denormalize and clamp the output of VQVAE out by out = out.mul(0.5).add_(0.5).clamp_(0, 1).

Apr 12 '24 13:04 keyu-tian

@luohao123 maybe you can create token maps (r1, r2, ..., rK) by repeating one index in [0, V-1] on all scales and then decode them to see how's the reconstructed image like.

Apr 12 '24 13:04 keyu-tian

VAR VAR copied to clipboard

The performance of VAR Tokenizer

VAR
VAR copied to clipboard