MUST
MUST copied to clipboard
Discrepancy in Accuracy without Distributed Mode
Hello, I am currently executing your code using a single GPU (without distributed mode). However, the results are significantly different from what was presented in your paper. Is it expected for the results to vary? For instance, the result on a single GPU for the DTD dataset is 50.1%, whereas in your paper, it is reported as 54.1% using Vit-B/16