desc
desc copied to clipboard
Auxiliary distribution P in the paper
Hi Xiangjie,
May I ask why you choose the specific auxiliary distribution P in the KL-divergence in the DESC paper? (It makes sense that small q’s are shrunk more with the square, but I didn’t get the idea of construct p_{ij} with q_{ij}. P and Q are usually different distributions in the KL divergence.)
For the extreme situation, each row of q_ij only has one nonzero. The distribution between p_ij and q_ij are the same.