Mido
Mido
Hi @AdityaKane2001, focal masking is just extracting a small-crop from an image-view. An image-view is created via random-data augmentations of an image. For efficiency, you can do both simultaneously in...
@AdityaKane2001 If you wish to explicitly separate the image-view generation from focal masking for conceptual reasons, you can create the image-views for the focal crops using RandomResizeCrop to a size...
Hi @Madhav-MKNC, thanks for your input! These sound like very reasonable changes. Do you want to submit a PR?
Hi @Madhav-MKNC, this also sound reasonable to me. Should be pretty straightforward, did you want to submit a PR for this change? If not I can take care of it...
Hi @JUGGHM, with a small image resolution (e.g., 224x224) we didn't see a huge advantage for longer training on IN1k to justify the additional compute, but I think longer training...
Hi @lezhang7 **Targets:** 1. Image first goes through a vit to get a sequence of patch-level features. 2. Next, we sample M=4 **blocks** (with aspect ratio and scale that you...
Hi @Gus-Guo, batch size is not important for the method, since there is no batch-component to the loss. However, I expect that changing the batch size might require you to...
It's surprising that it's happening already more than an epoch into training. Not sure if there are other processes running on your GPU, but could you try changing `enc_mask_scale` to...
We do not use the bare cross-attention module by default, but this is a valid update. Please make sure to follow the steps in the [CONTRIBUTING.MD](https://github.com/facebookresearch/jepa/blob/main/CONTRIBUTING.md) to be able to...
Yes that's correct, I just mean by default it is combined with a small MLP after the cross-attention. Please check off the Contributor License Agreement so that I can merge...