Why cauchy filter?
Hi, this is a very interesting work! I was just curious if you had some insights on why the cauchy filter worked for the attentive sampler over the latent space and the gaussian filter didn't? In their discrete forms, these filters can be made to look very similar?
The Cauchy distribution is a stable distribution from the location-scale family of distributions. In trying to find a way to apply attention, I tried a number of location-scale distributions that have a closed form representation. In addition to trying various divergence measures.
Unfortunately, I cannot say what specifically makes the Cauchy distribution work when a Gaussian would not. Sorry, I know that isn't a helpful answer.