datacomp
datacomp copied to clipboard
Reproducing no-filtering baseline `commonpool_s_s13m_b4k`
I'm trying to replicate basic results without filtering for now. There are some major differences between the models I trained locally and the ones on huggingface. The accuracy of the locally trained models is worse, when doing linear probing the differences are even larger (20% acc. from locally trained model vs. 44% from huggingface model on CIFAR100)
Dataset | Encoder | Zero-shot Test | Linear Probe Test |
---|---|---|---|
cifar10 | commonpool_s_s13m_b4k | 0.4077 | 0.685 ± 0.0014 |
cifar10 | local_commonpool_s_s13m_b4k_0 | 0.3572 | 0.4694 ± 0.0106 |
cifar10 | local_commonpool_s_s13m_b4k_1 | 0.3443 | 0.4565 ± 0.0143 |
cifar10 | local_commonpool_s_s13m_b4k_3 | 0.3406 | 0.4609 ± 0.0126 |
cifar10 | local_commonpool_s_s13m_b4k_4 | 0.3346 | 0.469 ± 0.0141 |
cifar10 | local_commonpool_s_s13m_b4k_2 | 0.3323 | 0.4447 ± 0.0164 |
vtab/cifar100 | commonpool_s_s13m_b4k | 0.1297 | 0.4355 ± 0.0025 |
vtab/cifar100 | local_commonpool_s_s13m_b4k_1 | 0.1246 | 0.2024 ± 0.0035 |
vtab/cifar100 | local_commonpool_s_s13m_b4k_0 | 0.1168 | 0.1997 ± 0.0085 |
vtab/cifar100 | local_commonpool_s_s13m_b4k_3 | 0.1139 | 0.2004 ± 0.0066 |
vtab/cifar100 | local_commonpool_s_s13m_b4k_2 | 0.1138 | 0.2002 ± 0.0043 |
vtab/cifar100 | local_commonpool_s_s13m_b4k_4 | 0.1128 | 0.2047 ± 0.0044 |
- To my understanding, just calling
train.py --scale small
on the unmodified commonpool dataset should replicate the no-filter baselinecommonpool_s_s13m_b4k
. Is that right? - I ran five different seeds for the pretraining and for each ten different seeds for the linear probing. Why are the results so different from the online models?