ssdg-benchmark Not able to reproduce numbers for PACS dataset.

I tried running your code and couldn't able to reproduce the numbers for the PACS dataset. The results I obtained for 5 samples per class and the numbers reported in the paper are as follows.

Art_Painting  77.18   78.54(reported)
Cartoon  73.74  74.44(reported)
Photo   89.35    89.25(reported)
Sketch  76.5   79.06(reported)
Avg    79.1925 80.32 (reported)

I can understand the 1% percentage point difference in the first three domains but

The numbers reported for a sketch are 3 percentage points higher than what I obtained
The number quoted for 5 samples per class is greater than the numbers obtained in the paper for 10 samples per class. I am a bit confused about this.

Can you please clarify this?

Jul 27 '22 14:07 Griffintaur

In my case, the results of different seeds don't deviate too much. Please see below

Parsing output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed1/log.txt. acc: 78.37%. err: 21.63%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed2/log.txt. acc: 80.66%. err: 19.34%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed3/log.txt. acc: 79.00%. err: 21.00%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed4/log.txt. acc: 76.46%. err: 23.54%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed5/log.txt. acc: 78.22%. err: 21.78%
===
outcome of directory: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting
* acc: 78.54% +- 1.35%
===
Parsing output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed1/log.txt. acc: 70.35%. err: 29.65%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed2/log.txt. acc: 77.65%. err: 22.35%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed3/log.txt. acc: 72.70%. err: 27.30%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed4/log.txt. acc: 77.01%. err: 22.99%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed5/log.txt. acc: 74.49%. err: 25.51%
===
outcome of directory: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon
* acc: 74.44% +- 2.71%
===
Parsing output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed1/log.txt. acc: 89.46%. err: 10.54%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed2/log.txt. acc: 84.91%. err: 15.09%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed3/log.txt. acc: 91.14%. err: 8.86%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed4/log.txt. acc: 91.56%. err: 8.44%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed5/log.txt. acc: 89.16%. err: 10.84%
===
outcome of directory: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo
* acc: 89.25% +- 2.36%
===
Parsing output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed1/log.txt. acc: 77.06%. err: 22.94%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed2/log.txt. acc: 81.64%. err: 18.36%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed3/log.txt. acc: 76.93%. err: 23.07%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed4/log.txt. acc: 79.40%. err: 20.60%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed5/log.txt. acc: 80.27%. err: 19.73%
===
outcome of directory: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch
* acc: 79.06% +- 1.83%
===
overall average
* acc: 80.32%

Just double check: did you run the experiments 5 times each as programmed? did you use the data provided by this code?

Jul 28 '22 13:07 KaiyangZhou

Yes, I used exact same settings and data provided by you to execute the experiments.

Jul 28 '22 17:07 Griffintaur

Hmm, it's strange that you got such a huge deviation on the sketch domain. I couldn't explain this. But from my experience, the PACS dataset and its protocol sometimes also leads to a big deviation in results, which are also observed by my colleagues. Anyway, let me know if you figure out the cause.

Jul 29 '22 07:07 KaiyangZhou

Can you let me the development environment where you ran those experiment because at least the numbers seem too high for the sketch domain or if possible, you can try to rerun sketch domain to see if they are reproducible

Aug 06 '22 11:08 Griffintaur

FYI

Collecting env info ...
** System info **
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Clang version: Could not collect
CMake version: version 2.8.12.2

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: Tesla V100-PCIE-32GB
GPU 1: Tesla V100-PCIE-32GB
GPU 2: Tesla V100-PCIE-32GB
GPU 3: Tesla V100-PCIE-32GB
GPU 4: Tesla V100-PCIE-32GB
GPU 5: Tesla V100-PCIE-32GB
GPU 6: Tesla V100-PCIE-32GB
GPU 7: Tesla V100-PCIE-32GB

Nvidia driver version: 418.67
cuDNN version: /usr/local/cuda-9.0/lib64/libcudnn.so.7.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.1.243             h6bb024c_0
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py37he8ac12f_0
[conda] mkl_fft                   1.3.0            py37h54f3939_0
[conda] mkl_random                1.1.1            py37h0573a6f_0
[conda] numpy                     1.20.1                   pypi_0    pypi
[conda] numpy-base                1.19.2           py37hfa32c7d_0
[conda] pytorch                   1.7.1           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchvision               0.8.2                py37_cu101    pytorch
        Pillow (8.1.0)

Aug 07 '22 02:08 KaiyangZhou

ssdg-benchmark ssdg-benchmark copied to clipboard

Not able to reproduce numbers for PACS dataset.

ssdg-benchmark
ssdg-benchmark copied to clipboard