tiny-audio-diffusion icon indicating copy to clipboard operation
tiny-audio-diffusion copied to clipboard

Training Freezes Before Starting

Open yukiarimo opened this issue 1 year ago โ€ข 1 comments

(tiny-audio-diffusion) yuki@yuki tiny-audio-diffusion % python train.py exp=drum_diffusion trainer.gpus=1 datamodule.dataset.path=/Users/yuki/Downloads/tiny-audio-diffusion/samples
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2024-05-11 02:16:26,217][main.utils][INFO] - Disabling python warnings! <config.ignore_warnings=True>
Global seed set to 12345
[2024-05-11 02:16:26,220][__main__][INFO] - Instantiating datamodule <main.diffusion_module.Datamodule>.
[2024-05-11 02:16:27,005][__main__][INFO] - Instantiating model <main.diffusion_module.Model>.
[2024-05-11 02:16:27,183][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.RichProgressBar>.
[2024-05-11 02:16:27,183][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.ModelCheckpoint>.
[2024-05-11 02:16:27,185][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.RichModelSummary>.
[2024-05-11 02:16:27,186][__main__][INFO] - Instantiating callback <main.diffusion_module.SampleLogger>.
[2024-05-11 02:16:27,187][__main__][INFO] - Instantiating logger <pytorch_lightning.loggers.wandb.WandbLogger>.
wandb: Currently logged in as: yukiarimo. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.17.0
wandb: Run data is saved locally in /Users/yuki/Downloads/tiny-audio-diffusionlogs/wandb/run-20240511_021628-7k1pjexi
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run unconditional_diffusion
wandb: โญ๏ธ View project at https://wandb.ai/yukiarimo/wandbprojectname
wandb: ๐Ÿš€ View run at https://wandb.ai/yukiarimo/wandbprojectname/runs/7k1pjexi
[2024-05-11 02:16:33,399][__main__][INFO] - Instantiating trainer <pytorch_lightning.Trainer>.
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2024-05-11 02:16:33,438][__main__][INFO] - Logging hyperparameters!
[2024-05-11 02:16:33,456][__main__][INFO] - Starting training.
โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ   โ”ƒ Name                โ”ƒ Type           โ”ƒ Params โ”ƒ
โ”กโ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ 0 โ”‚ model               โ”‚ DiffusionModel โ”‚ 31.6 M โ”‚
โ”‚ 1 โ”‚ model.net           โ”‚ Module         โ”‚ 31.6 M โ”‚
โ”‚ 2 โ”‚ model.diffusion     โ”‚ VDiffusion     โ”‚ 31.6 M โ”‚
โ”‚ 3 โ”‚ model.sampler       โ”‚ VSampler       โ”‚ 31.6 M โ”‚
โ”‚ 4 โ”‚ model_ema           โ”‚ EMA            โ”‚ 63.1 M โ”‚
โ”‚ 5 โ”‚ model_ema.ema_model โ”‚ DiffusionModel โ”‚ 31.6 M โ”‚
โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Trainable params: 31.6 M                                                        
Non-trainable params: 31.6 M                                                    
Total params: 63.1 M                                                            
Total estimated model params size (MB): 126                                     
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

yukiarimo avatar May 11 '24 08:05 yukiarimo

Hi @yukiarimo. What you shared are pretty standard logs, so they do not really provide any context into what might be your issue. I have not tested this repo on MPS, rather only NVIDIA GPUs or CPUs, so I would start there (i.e. remove the trainer.gpus=1 argument).

crlandsc avatar May 17 '24 15:05 crlandsc