DeepDPM icon indicating copy to clipboard operation
DeepDPM copied to clipboard

Training stuck at one certain epoch

Open derkbreeze opened this issue 1 year ago • 0 comments

Hi Meitar,

So I was training on the MNIST dataset using pretrained features, e.g.

python DeepDPM.py --dataset MNIST --dir './pretrained_embeddings/umap_embedded_datasets/MNIST' --gpus 0

but every time training stucks at epoch 44 and will not continue, log:

Epoch 0: 100%|███████████| 547/547 [00:00<00:00, 661.71it/s, loss=nan, v_num=]Initializing clusters params using Kmeans... Epoch 44: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 547/547 [00:19<00:00, 27.60it/s, loss=0, v_num=]

Also, why the loss becomes nan in the first epoch? Appreciate if you can suggest!

derkbreeze avatar Apr 21 '23 12:04 derkbreeze