pcmepp
pcmepp copied to clipboard
The problem with formulas
For binary loss, the final derivative results seem to be different from the paper, when m=0 or 1, I calculated the results as -sigmoid(l_vt) and sigmoid(-l_vt), respectively.
This answer is wrong. Please check my additional comment below.
It is because sigmoid
is defined as a compressed version in this paper, i.e., $\text{sigmoid} (x) = \frac{\exp x }{\exp x + \exp (-x)}$ where usually we define $\text{sigmoid}_0 (x) = \frac{1}{1 + \exp (-x)}$. Namely, $\text{sigmoid}(x) = \text{sigmoid}_0 (2x)$.
Therefore, when m=0, the derivation becomes $\frac{d \log \text{sigmoid} (x)}{dx} = 2(1 - \text{sigmoid}(x))$ (derivation) when m=1, the derivation becomes $\frac{d \log \text{sigmoid} (x)}{dx} = -2(1 - \text{sigmoid(-x)})$ (derivation)
Hmm, I just realized that I omitted the constant terms (2
and -2
) in the paper. Although the constant terms do not change the claim (false negatives / positives lead to loss saturation), I will revise the camera-ready version to clarify this.
Thanks for your question!
Thank you for your wonderful paper, your improvements from PCME to this article are refreshing and bring a new perspective to image-text retrieval.
@ahustr Hi, while I updated the camera-ready paper, I found that your original derivation was correct and my derivation was wrong. First, the compressed sigmoid is not used in PCME++. It is used for PCME. I was confused while writing the issue. Second, the loss saturation happens only for PCME, due to the derivation of matching probability NLL, not PCME++ loss (both are almost the same, but match prob NLL = $-\log \mathbb E [\text{sigmoid} (\ell)]$ and PCME++ loss = $-\log \text{sigmoid} (\mathbb E [\ell])$, where $\ell$ is defined as the $| Z_v - Z_t |_2$ (for more clarification, PCME++ uses the squared version due to compute the closed-form) The above derivation is wrong, because it is the derivation of loglikelihood (LL), not negative LL.
Namely, PCME++ loss does not directly suffer from the loss saturation issue like PCME. I have updated the camera-ready by correcting the errors. I clarified that the additional techniques are for performance enhancements under abundant FNs rather than avoiding loss saturation. Check the final version of the paper: https://openreview.net/forum?id=ft1mr3WlGM
Please check the PCME paper (https://arxiv.org/abs/2101.05068) Section 3.2.2 for more details about the loss saturation:
Thank you for your patience in responding and for your sense of responsibility in research.