pcmepp
pcmepp copied to clipboard
The problem with formulas
For binary loss, the final derivative results seem to be different from the paper, when m=0 or 1, I calculated the results as -sigmoid(l_vt) and sigmoid(-l_vt), respectively.