torchnca
torchnca copied to clipboard
nan/divide by zero issues in loss function with sparse data input
Sparse input matrix (30k cells x 6000 genes, single-cell sequencing data, with 35 labels for the cells) produces nan's in the loss function during training. Adding a small epsilon, 1e-12, to the softmax calculation (preventing divide by zeroes) does not seem to improve training either (loss function does not decrease). Is there a possible issue with distance calculation on sparse input or further parameter tuning that is necessary?
I also had problems with a complex data set, 100s of features, 10s of 1000s of samples. I made some progress naively using doubles instead of floats and turning down the learning rate (1e-6 or 1e-7 in my case). But things still seem unstable and the loss still varies quite a lot more than I would expect. It does feel like something is off in the loss.
@gerner I'm receiving the same error, any further tips?
@GeorgePearse been a while since I did anything with torchnca, in part because of bug/operational issues like this. I think I was trying to do some unsupervised clustering work at the time.
I've since moved on to using a UMAP + HDBSCAN approach which seems compelling.
@gerner Cheers for getting back to me. I just need a library that implements a memory-efficient supervised transformation to improve the performance of a KNN classifier. http://contrib.scikit-learn.org/metric-learn/ looked like it should do the trick but nothing else seems to be able to run in batches and I max out my server's memory when I test them. Thanks once more anyway.
@GeorgePearse I had some success with LMNN. There's a python implementation, PyLMNN. Perhaps you can give it a try?
@gerner Not sure how many iterations I have to run for it to improve my KNN but having much better luck (the fact that its runs at all). Dimensions of input posted for other's reference. Will try to find out the memory consumption of the process as this is a very high-spec server. Thanks once more.