Umar Khalid
Umar Khalid
It gives the following error when I use the above weights: File "/home/taoyang/PycharmProjects/AdaptFormer-main/util/pos_embed.py", line 122, in interpolate_pos_embed_ori pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2) RuntimeError: shape '[-1, 14,...
I think we have to delete the following keys del checkpoint_model['cls_token'] del checkpoint_model['pos_embed']
Any solution to example_batch["pixel_values"] = [_val_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]] The error is: KeyError: 'image' example_batch has no "image" key
Has anyone found any solution?
> Thanks for the suggestion. We will consider it. That will be really helpful.
Can you suggest what changes do I need to make in the following code (line 32-38) [https://github.com/antimatter15/splat/blob/main/convert.py](url)
My understanding is that SVD can take text input. But, AYG pipeline is similar where they first generate 3D object based on text and then employ video diffusion model. Did...
Well, I haven't fully explored their model. But I remember when I read the paper. They do mention text to video results. Figure 1 row 1. Also, abstracts indicates it....