datacomp Reproducing no-filtering baseline `commonpool_s_s13m

Reproducing no-filtering baseline `commonpool_s_s13m_b4k`

Open dwahdany opened this issue 5 months ago • 0 comments

I'm trying to replicate basic results without filtering for now. There are some major differences between the models I trained locally and the ones on huggingface. The accuracy of the locally trained models is worse, when doing linear probing the differences are even larger (20% acc. from locally trained model vs. 44% from huggingface model on CIFAR100)

Dataset	Encoder	Zero-shot Test	Linear Probe Test
cifar10	commonpool_s_s13m_b4k	0.4077	0.685 ± 0.0014
cifar10	local_commonpool_s_s13m_b4k_0	0.3572	0.4694 ± 0.0106
cifar10	local_commonpool_s_s13m_b4k_1	0.3443	0.4565 ± 0.0143
cifar10	local_commonpool_s_s13m_b4k_3	0.3406	0.4609 ± 0.0126
cifar10	local_commonpool_s_s13m_b4k_4	0.3346	0.469 ± 0.0141
cifar10	local_commonpool_s_s13m_b4k_2	0.3323	0.4447 ± 0.0164
vtab/cifar100	commonpool_s_s13m_b4k	0.1297	0.4355 ± 0.0025
vtab/cifar100	local_commonpool_s_s13m_b4k_1	0.1246	0.2024 ± 0.0035
vtab/cifar100	local_commonpool_s_s13m_b4k_0	0.1168	0.1997 ± 0.0085
vtab/cifar100	local_commonpool_s_s13m_b4k_3	0.1139	0.2004 ± 0.0066
vtab/cifar100	local_commonpool_s_s13m_b4k_2	0.1138	0.2002 ± 0.0043
vtab/cifar100	local_commonpool_s_s13m_b4k_4	0.1128	0.2047 ± 0.0044

To my understanding, just calling train.py --scale small on the unmodified commonpool dataset should replicate the no-filter baseline commonpool_s_s13m_b4k. Is that right?
I ran five different seeds for the pretraining and for each ten different seeds for the linear probing. Why are the results so different from the online models?

Sep 17 '24 09:09 dwahdany

datacomp datacomp copied to clipboard

Reproducing no-filtering baseline `commonpool_s_s13m_b4k`

datacomp
datacomp copied to clipboard