Training Freezes Before Starting

Open yukiarimo opened this issue 1 year ago • 1 comments

(tiny-audio-diffusion) yuki@yuki tiny-audio-diffusion % python train.py exp=drum_diffusion trainer.gpus=1 datamodule.dataset.path=/Users/yuki/Downloads/tiny-audio-diffusion/samples
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2024-05-11 02:16:26,217][main.utils][INFO] - Disabling python warnings! <config.ignore_warnings=True>
Global seed set to 12345
[2024-05-11 02:16:26,220][__main__][INFO] - Instantiating datamodule <main.diffusion_module.Datamodule>.
[2024-05-11 02:16:27,005][__main__][INFO] - Instantiating model <main.diffusion_module.Model>.
[2024-05-11 02:16:27,183][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.RichProgressBar>.
[2024-05-11 02:16:27,183][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.ModelCheckpoint>.
[2024-05-11 02:16:27,185][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.RichModelSummary>.
[2024-05-11 02:16:27,186][__main__][INFO] - Instantiating callback <main.diffusion_module.SampleLogger>.
[2024-05-11 02:16:27,187][__main__][INFO] - Instantiating logger <pytorch_lightning.loggers.wandb.WandbLogger>.
wandb: Currently logged in as: yukiarimo. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.17.0
wandb: Run data is saved locally in /Users/yuki/Downloads/tiny-audio-diffusionlogs/wandb/run-20240511_021628-7k1pjexi
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run unconditional_diffusion
wandb: ⭐️ View project at https://wandb.ai/yukiarimo/wandbprojectname
wandb: 🚀 View run at https://wandb.ai/yukiarimo/wandbprojectname/runs/7k1pjexi
[2024-05-11 02:16:33,399][__main__][INFO] - Instantiating trainer <pytorch_lightning.Trainer>.
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2024-05-11 02:16:33,438][__main__][INFO] - Logging hyperparameters!
[2024-05-11 02:16:33,456][__main__][INFO] - Starting training.
┏━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃   ┃ Name                ┃ Type           ┃ Params ┃
┡━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0 │ model               │ DiffusionModel │ 31.6 M │
│ 1 │ model.net           │ Module         │ 31.6 M │
│ 2 │ model.diffusion     │ VDiffusion     │ 31.6 M │
│ 3 │ model.sampler       │ VSampler       │ 31.6 M │
│ 4 │ model_ema           │ EMA            │ 63.1 M │
│ 5 │ model_ema.ema_model │ DiffusionModel │ 31.6 M │
└───┴─────────────────────┴────────────────┴────────┘
Trainable params: 31.6 M                                                        
Non-trainable params: 31.6 M                                                    
Total params: 63.1 M                                                            
Total estimated model params size (MB): 126                                     
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

May 11 '24 08:05 yukiarimo

Hi @yukiarimo. What you shared are pretty standard logs, so they do not really provide any context into what might be your issue. I have not tested this repo on MPS, rather only NVIDIA GPUs or CPUs, so I would start there (i.e. remove the trainer.gpus=1 argument).

May 17 '24 15:05 crlandsc