torchnca icon indicating copy to clipboard operation
torchnca copied to clipboard

nan/divide by zero issues in loss function with sparse data input

Open tarachari3 opened this issue 4 years ago • 6 comments

Sparse input matrix (30k cells x 6000 genes, single-cell sequencing data, with 35 labels for the cells) produces nan's in the loss function during training. Adding a small epsilon, 1e-12, to the softmax calculation (preventing divide by zeroes) does not seem to improve training either (loss function does not decrease). Is there a possible issue with distance calculation on sparse input or further parameter tuning that is necessary?

tarachari3 avatar Jul 21 '20 18:07 tarachari3

I also had problems with a complex data set, 100s of features, 10s of 1000s of samples. I made some progress naively using doubles instead of floats and turning down the learning rate (1e-6 or 1e-7 in my case). But things still seem unstable and the loss still varies quite a lot more than I would expect. It does feel like something is off in the loss.

gerner avatar Jan 05 '21 17:01 gerner

@gerner I'm receiving the same error, any further tips?

GeorgePearse avatar Nov 01 '21 17:11 GeorgePearse

@GeorgePearse been a while since I did anything with torchnca, in part because of bug/operational issues like this. I think I was trying to do some unsupervised clustering work at the time.

I've since moved on to using a UMAP + HDBSCAN approach which seems compelling.

gerner avatar Nov 01 '21 17:11 gerner

@gerner Cheers for getting back to me. I just need a library that implements a memory-efficient supervised transformation to improve the performance of a KNN classifier. http://contrib.scikit-learn.org/metric-learn/ looked like it should do the trick but nothing else seems to be able to run in batches and I max out my server's memory when I test them. Thanks once more anyway.

GeorgePearse avatar Nov 01 '21 18:11 GeorgePearse

@GeorgePearse I had some success with LMNN. There's a python implementation, PyLMNN. Perhaps you can give it a try?

gerner avatar Nov 02 '21 00:11 gerner

@gerner Not sure how many iterations I have to run for it to improve my KNN but having much better luck (the fact that its runs at all). Dimensions of input posted for other's reference. Will try to find out the memory consumption of the process as this is a very high-spec server. Thanks once more.

image

GeorgePearse avatar Nov 03 '21 08:11 GeorgePearse