enhancing-transformers
enhancing-transformers copied to clipboard
Reconstruction results
Hi, First of all thanks for you work.
Working with vit small, I see that results are far away from VQGAN, did you stop training when reached convergence? Do you think there is more room to improve the model performance/
Results with vit-small
input image
Can you show me your code for reconstruction?I also meet this problem that reconstruction results of the ViT-VQGAN on ImageNet are very terrible.
config = OmegaConf.load('configs/imagenet_vitvq_small.yaml') model = initialize_from_config(config.model) model.init_from_ckpt('/home/marcelo/Downloads/imagenet_vitvq_small.ckpt')
def preprocess(img): s = min(img.size)
if s < 256:
raise ValueError(f'min dim for image {s} < 256')
r = 1024 / s
s = (round(r * img.size[1]), round(r * img.size[0]))
img = TF.resize(img, s, interpolation=PIL.Image.LANCZOS)
img = TF.center_crop(img, output_size=2 * [256])
img = torch.unsqueeze(T.ToTensor()(img), 0)
return img
original=Image.open('/home/marcelo/Downloads/212861459-e4113b34-622d-4602-afe4-f20e2d79425c.png') image=preprocess(original) image = image[:,:3,:,:]
quant, _ = model.encode(image) dec = model.decode(quant)
Actually, I think the reason is the bad model checkpoint. Your script is right. I measure the rFID, it is far away from VQGAN. I also train the model on ImageNet, but it still works badly. From: @.> Date: Sat, Aug 12, 2023, 03:04 Subject: [External] Re: [thuanz123/enhancing-transformers] Reconstruction results (Issue #20) To: "thuanz123/enhancing-transformers"< @.> Cc: @.>, "Comment"< @.>
The same as the one in the colab notebook
— Reply to this email directly, view it on GitHub https://github.com/thuanz123/enhancing-transformers/issues/20#issuecomment-1675236319, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7ZMO7CSSW4EOBN4SOSLJVDXUZ63PANCNFSM6AAAAAA3L3JDYQ . You are receiving this because you commented.Message ID: @.***>
So, after your training, you obtain a better model weights that improve the reconstruction?