Phil Wang

Results 812 comments of Phil Wang

this would be huge! you have no idea the needless complexity i have written up in the past https://github.com/lucidrains/point-transformer-pytorch/blob/main/point_transformer_pytorch/point_transformer_pytorch.py#L13 lol

@arogozhnikov whatever you think is best Alex! :pray: just wanted to incept the idea :)

looks great! can't wait for the official release :P

@Atul997 this is a nice scheme https://github.com/lucidrains/vit-pytorch#learnable-memory-vit

oh hey! yea I believe it is equivalent at least, I could do an extra rearrange on the predicted pixels to get back the reconstructed image

@wnma3mz i think being able to get back the reconstructed image is a good idea, let me get the function out when i find some time. feel free to leave...

@chrisway613 Hi Chris! while this is true, i think leaving untrained parameters in the wrapper class isn't elegant. you can always just concat the CLS tokens onto the `decoder_pos_emb` after...

@s974534426 its pretty hard to train a plain ViT from scratch, if you are not google or facebook try https://github.com/lucidrains/vit-pytorch#nest , you should have an easier time with that

@Songloading i have no idea! i'm not really familiar with timm - perhaps you can ask Ross about it?

@chokyungjin I don't totally understand your question, but to clarify the `pred_pixel_values` and `masked_patches` are both in pixel space from the original image. they have just been im2col per patch