IntroVAE icon indicating copy to clipboard operation
IntroVAE copied to clipboard

Crashes on 1-GPU with log access error

Open gwern opened this issue 4 years ago • 1 comments

When I run IntroVAE on 1 GPU (to test how it works on my anime faces), I get indexing/scalar errors from PyTorch (TypeError: only integer scalar arrays can be converted to a scalar index), which appear to be due to the [0] accessors (I assume because it assumes that there are multiple GPUs and multiple metrics, so it tries to access the first one, but it's not a list/array but a value and that makes no sense). To make IntroVAE run, I need to patch it to remove them from the logging statements:

-        info += 'Rec: {:.4f}, '.format(loss_rec.data[0])
-        info += 'Kl_E: {:.4f}, {:.4f}, {:.4f}, '.format(lossE_real_kl.data[0],
-                                lossE_rec_kl.data[0], lossE_fake_kl.data[0])
-        info += 'Kl_G: {:.4f}, {:.4f}, '.format(lossG_rec_kl.data[0], lossG_fake_kl.data[0])
-
+
+        info += 'Rec: {:.4f}, '.format(loss_rec.data)
+        info += 'Kl_E: {:.4f}, {:.4f}, {:.4f}, '.format(lossE_real_kl.data,
+                                lossE_rec_kl.data, lossE_fake_kl.data)
+        info += 'Kl_G: {:.4f}, {:.4f}, '.format(lossG_rec_kl.data, lossG_fake_kl.data)

Might be worth fixing somehow, or at least documenting.

gwern avatar Oct 21 '19 14:10 gwern

This is not related to the number of GPUs, .data[0] was used to access the underlying python float from a size 1 tensor in previous versions of pytorch. The preferred way is now .item().

EtienneDesticourt avatar Feb 26 '20 08:02 EtienneDesticourt