Tianhong Li comments

Results 65 comments of


                                            Tianhong Li

Why not adopt bert like MaskGIT to reconstruct Tokens?

In fact we started from MaskGIT's BERT architecture, but we find both linear probing and unconditional generation performance are poor (57.4% accuracy, 20.7 FID). Then we find that adopting the...

Why not adopt bert like MaskGIT to reconstruct Tokens?

We must use image tokens as both input and output to enable image generation, because image generation takes multiple steps. In the middle of generation, only part of the tokens...

Why did you retrain a new module token_emb, have you ever tried to change the VQGAN codebook from dim=256 to dim=768 via MLP?

We did not try that, but it could be an option to create the token_emb. We use a new module token_emb because we simply regard the tokens as words, and...

Is there any script to generate the Fig.2 in the paper?

We may release a colab for image editing in the next few months.

Is there any script to generate the Fig.2 in the paper?

Yes, the temperature selection is a bit tricky. For iter=1, we use argmax and temp=0.0 if I remember correctly. For iter=6, we use categorical sampling and temp=4.5.

Is there any script to generate the Fig.2 in the paper?

You can use either ViT-B or ViT-L for that. ViT-L will give you slightly visually better results.

Is there any script to generate the Fig.2 in the paper?

> I add an inference code, but looks not correct. Here is a input and output for my code. ![imagenet_1k_1](https://private-user-images.githubusercontent.com/22957622/321566142-a8cc45b8-c196-4ae9-b083-647b76b9ecf6.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI4OTIzNjUsIm5iZiI6MTcxMjg5MjA2NSwicGF0aCI6Ii8yMjk1NzYyMi8zMjE1NjYxNDItYThjYzQ1YjgtYzE5Ni00YWU5LWIwODMtNjQ3Yjc2YjllY2Y2LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDEyVDAzMjEwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTEzNGRiYzFkNmJkZDA3MGRlNDE2Zjg0MTMyMGY2MGJiZjE2ZWVlN2VhNDE3MGUxOWU5NWJlZWM4YmQ2NjhhOGMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.8dKx5oXSnusA9xxWS_dDjL5QSHyBanIvCto4FrKG3WU) ![00000](https://private-user-images.githubusercontent.com/22957622/321566220-6ed48fd4-e98d-4edb-b16c-f71c08bef7e1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI4OTIzNjUsIm5iZiI6MTcxMjg5MjA2NSwicGF0aCI6Ii8yMjk1NzYyMi8zMjE1NjYyMjAtNmVkNDhmZDQtZTk4ZC00ZWRiLWIxNmMtZjcxYzA4YmVmN2UxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDEyVDAzMjEwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI0ODFkY2U0MGE0NDE3MGZlM2E4NDc1MzE0NWZlYjNjMTJjNjA2MzJiNzk0NjJiZjMxMzgwOWIwMzhkYWVlNmYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.m9ScL3B-zJrPHCaItCdJxr8xFF7wDQC0Il3kPrNebiI) > > The inference code is based on gen_img_uncond.py,...

Tianhong Li

Why not adopt bert like MaskGIT to reconstruct Tokens?

Why not adopt bert like MaskGIT to reconstruct Tokens?

Why did you retrain a new module token_emb, have you ever tried to change the VQGAN codebook from dim=256 to dim=768 via MLP?

Is there any script to generate the Fig.2 in the paper?

Is there any script to generate the Fig.2 in the paper?

Is there any script to generate the Fig.2 in the paper?

Is there any script to generate the Fig.2 in the paper?

The specified path cannot be found

The specified path cannot be found

The specified path cannot be found