automahalanobis Delta and the distribution

Delta and the distribution

Open Pozimek opened this issue 3 years ago • 1 comments

I know that this is an old repository, but if I correctly understand it the layer implemented is not computing the distance between the reconstructed tensor and the distribution of the ground-truth tensors correctly. I came close to using this repo so I hope someone finds this useful.

This line is the first focus of my contention:

    def update(self, X, X_fit):
        delta = X - X_fit
        self.S = (1 - self.decay) * self.S + self.decay * self.cov(delta)
        self.S_inv = torch.pinverse(self.S)

The goal of the update function is to update the covariance matrix used in Mahalonobis distance formula. The problem here is that this update function computes the covariance of the difference between the ground-truth data (X) and the reconstructed data (X_fit), rather than computing the covariance of the ground-truth data itself (X) like we want to. The goal of using this distance metric, at least as far as I understand, is to bring individual reconstructed samples closer to the mean of the distribution of the ground-truth samples, not the mean of the distribution of the ever-changing delta between our model's predictions and the ground truth. The update function should not take as input X_fit at all.

The second issue is that the code in this repo treats the multi-dimensional ground-truth variable (x) as a vector of means of the distribution in the forward function:

    def forward(self, x, x_fit):
        delta = x - x_fit
        m = torch.mm(torch.mm(delta, self.S_inv), delta.t())
        return torch.diag(m)

But x is not a vector of means of the ground-truth distribution, it is a batch of variables from the ground-truth distribution. A lazy fix would involve replacing line 22 with delta = x_fit - torch.mean(x, dim=0) to take the mean of the batch. A more rigorous fix would involve rewriting the update function to keep a running mean of the ground-truth distribution as training progresses (just like it now keeps a running covariance matrix of the delta), and the forward function to call that function. Like so:

    def forward(self, x, x_fit):
        self.update(x)
        delta = self.x_mean - x_fit
        m = torch.mm(torch.mm(delta, self.S_inv), delta.t())
        return torch.diag(m)

    def update(self, X):
        self.S = (1 - self.decay) * self.S + self.decay * self.cov(X)
        self.S_inv = torch.pinverse(self.S)
        self.x_mean = (1 - self.decay) * self.x_mean + self.decay * torch.mean(X, dim=0)

Let me know if I made any mistakes.

Apr 21 '21 16:04 Pozimek

Thanks for the report. Interesting points. I had to dive in the code a bit again to understand my thoughts when I created the repo.

I do believe the current implementation reflects how I intended the experiment. I was wondering what would happen if we minimize the loss function taken over the Mahalanobis distance between ground truth and reconstruction, instead of the loss taken over the Euclidean distance between the two (like a normal autoencoder). I was particularly interested in what would happen to the learned manifold in the low-dimensional space, and if this would allow the autoencoder to get a tighter fit to the encoded data.

Indeed this implies calculating the Mahalonobis distance over the ever-changing delta, which is also the point where I encountered difficulties. During training I started observing an oscillating pattern in the loss, which I believe was due to the interaction between the gradient descent updates and the decaying updates to the covariance matrix. It was difficult to get the autoencoder to converge because of this.

As I understand, from what you describe and the code snippets, I believe you expected the autoencoder to minimize the Mahalanobis distance between the reconstruction and the mean of the original distribution. Provided that the training data is fixed, the (estimate of the) mean of the original distribution is fixed, so the autoencoder would essentially be learning a mapping from the input space to a single fixed point: the estimate of the distribution mean. Because of the scaling by the covariance matrix it would penalize errors in some directions (those corresponding to low variance in the input space) heavier than other directions (those corresponding to high variance in the input space), but in the end all reconstructions would be equal. Is this what you intended?

Another alternative would be to calculate the Mahalanobis distance as a pre-processing step over the input data, and after that feed it into the autoencoder. The autoencoder would then be reconstructing Mahalanobis distances - minimizing euclidean distance between input (ground truth) Mahalanobis distance and reconstructed Mahalanobis distance.

Since more people have taken an interest in this repo and the intend can indeed be interpreted in multiple ways, I'll leave this issue open so that it might be useful to others

Apr 26 '21 21:04 bflammers

automahalanobis automahalanobis copied to clipboard

Delta and the distribution

automahalanobis
automahalanobis copied to clipboard