Pavinags

Results 2 issues of Pavinags

When using a cityscapes data of size 780,780 the code throws an error as shown. ![image](https://github.com/facebookresearch/ijepa/assets/134510149/4948f3c2-f282-4a37-ac7b-02d7e5f65052) why do we subtract 1 in line 428,429?and extract the class_emb seperately? This causes...

Where exactly do we pass the region embeddings as tokens to the transformer encoder? All I can see is that the token and affinity both are defined at decoder head