Phil Wang comments

Results 812 comments of


Phil Wang

[Feature suggestion] einops for array indexing

this would be huge! you have no idea the needless complexity i have written up in the past https://github.com/lucidrains/point-transformer-pytorch/blob/main/point_transformer_pytorch/point_transformer_pytorch.py#L13 lol

[Feature suggestion] splatting list of anonymous dimensions of any length

@arogozhnikov whatever you think is best Alex! :pray: just wanted to incept the idea :)

Update "writing better code" guide to use EinMix

looks great! can't wait for the official release :P

Add another MLP head in vision transformer

@Atul997 this is a nice scheme https://github.com/lucidrains/vit-pytorch#learnable-memory-vit

update simmim using conv decoder which replace simple header

oh hey! yea I believe it is equivalent at least, I could do an extra rearrange on the predicted pixels to get back the reconstructed image

update simmim using conv decoder which replace simple header

@wnma3mz i think being able to get back the reconstructed image is a good idea, let me get the function out when i find some time. feel free to leave...

Some questions about decoder position embedding for masked tokens

@chrisway613 Hi Chris! while this is true, i think leaving untrained parameters in the wrapper class isn't elegant. you can always just concat the CLS tokens onto the `decoder_pos_emb` after...

Training vit on Imagenet 1k got bad performance.

@s974534426 its pretty hard to train a plain ViT from scratch, if you are not google or facebook try https://github.com/lucidrains/vit-pytorch#nest , you should have an easier time with that

MAE using pretrained VIT

@Songloading i have no idea! i'm not really familiar with timm - perhaps you can ask Ross about it?

MAE image recon

@chokyungjin I don't totally understand your question, but to clarify the `pred_pixel_values` and `masked_patches` are both in pixel space from the original image. they have just been im2col per patch