DINO
DINO copied to clipboard
About the token Selection
Nice work! When selecting tokens from the encoder output, the output dimension of the class_embedding is 91, which includes the category of "no object". Will the tokens selected in this way have an impact on the results?
We use focal loss, where no "no object" token exists. Or you can view it as multiple binary classifications.