Umar Khalid comments

Results 8 comments of


                                            Umar Khalid

Vit-B IN21K weights

It gives the following error when I use the above weights: File "/home/taoyang/PycharmProjects/AdaptFormer-main/util/pos_embed.py", line 122, in interpolate_pos_embed_ori pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2) RuntimeError: shape '[-1, 14,...

Vit-B IN21K weights

I think we have to delete the following keys del checkpoint_model['cls_token'] del checkpoint_model['pos_embed']

MobileViT

Any solution to example_batch["pixel_values"] = [_val_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]] The error is: KeyError: 'image' example_batch has no "image" key

Using multiple GPU's?

Has anyone found any solution?

ply issues

> Thanks for the suggestion. We will consider it. That will be really helpful.

Keys Comparison with Vanilla GS

Can you suggest what changes do I need to make in the following code (line 32-38) [https://github.com/antimatter15/splat/blob/main/convert.py](url)

Why not text to 4D?

My understanding is that SVD can take text input. But, AYG pipeline is similar where they first generate 3D object based on text and then employ video diffusion model. Did...

Why not text to 4D?

Well, I haven't fully explored their model. But I remember when I read the paper. They do mention text to video results. Figure 1 row 1. Also, abstracts indicates it....