CapDec About the variance of 0.016

Hi, authors. In your paper, you mentioned:

Specifically, we set $\epsilon$ to the mean $\mathcal{l}_\infty$ norm of embedding differences between five captions that correspond to the same image. We estimated this based on captions of only 15 MS-COCO images.

I would like to know, have you released the corresponding code detail of the estimate of the variance?

Nov 14 '22 12:11 232525

Hi, Yes. We calculate it using the predictions_runner.py script with the flag --ablation_dist and the flag --text_autoencoder (which specifies to use CLIP text encoder rather than an image encoder. see its main method there: calc_distances_of_ready_embeddings). This will print the results of the average max norm of 900 samples but it will also save the 900 values to a pickle file so you can make sure this estimation could be done also with only 15 samples with high confidence (i.e. the variance of the max norm of different sets is negligible).

Nov 14 '22 12:11 DavidHuji

Thanks for your reply, I calc the average result on val set, and it seems normal. But something confused me: you mentioned $l_\infty$ norm in your paper, but it seems that the $l1$ norm result is adopted. Is there anything I missed or mistake?

Nov 15 '22 02:11 232525

It prints a few different metrics though the infinity norm is the one that is printed here (line 86).

Nov 15 '22 09:11 DavidHuji