Drinky
Drinky
Hi, I found that the result of mocov2 in Table 1 of the SOCO paper is particularly high (40.4 bb AP). But the experiment I have run shows that the...
Do you use token label? If not, maybe you should modify "return_dense=True, mix_token=True" to "return_dense=False, mix_token=False" in https://github.com/sail-sg/volo/blob/068a58399e5a8f7fbb1b348522a96c87148caeef/models/volo.py#L462
The ''reconstruction'' is the output of the vae, which usually looks the same as the ''input'' as the vae is an autoencoder. The ''samples'' and ''samples_cfg_scale_3.00'' is the generated results...
I am not the author and I hope my answer can help you. A1: because the neighbor of a patch is defined in the same view. The ''neighbor'' is not...
This is because the selfpatch checkpoint does not contain the CLS token. Therefore, the position embedding's size is mismatched. In selfpatch, the CLS token is in the SelfPatchHead [https://github.com/alinlab/SelfPatch/blob/main/selfpatch_vision_transformer.py#L362](url), so...
You should make sure you delete the CLS token in the ViT first. And then, you can insert `x = x.mean(dim=1)` after the `x = self.norm(x)` and then return the...
To overcome the performance drops, I recommend copying the SelfPatch ViT to the Dino ViT. The main difference between them is: > SelfPatch uses the CA block after the ViT...
I mean you should replace the dino vit model's code with selfpatch vit model's code.