continuous-audio-representations icon indicating copy to clipboard operation
continuous-audio-representations copied to clipboard

Unable to train model on SPEECHCOMMANDS dataset

Open jayathungek opened this issue 2 years ago • 0 comments

I've downloaded the speech_commands_v0.02 tar file and extracted it into the following directory structure:

data/SCNUMBERS1024
└── SpeechCommands
    └── speech_commands_v0.02
        ├── _background_noise_
        ├── backward
        ├── bed
        ├── bird
        ├── cat
        ├── dog
        ├── down
        ├── eight
        ├── five
        ├── follow
        ├── forward
        ├── four
        ├── go
        ├── happy
        ├── house
        ├── learn
        ├── left
        ├── marvin
        ├── nine
        ├── no
        ├── off
        ├── on
        ├── one
        ├── right
        ├── seven
        ├── sheila
        ├── six
        ├── stop
        ├── three
        ├── tree
        ├── two
        ├── up
        ├── visual
        ├── wow
        ├── yes
        └── zero

I then try to train the model on this dataset via:

$ python train.py --wandb 0 --architecture pi-gan_wide --dataset_name SPEECHCOMMANDS --dataset_size 128

but run into a NoneType error, which leads me to believe than the dataset is not initialised properly somehow. the full output of running the above command is below:

~/.virtualenvs/pcinr/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  warnings.warn(
1
{   'architecture': 'pi-gan_wide',
    'audio_length': 16000,
    'autoconfig': 0,
    'batch_size': 128,
    'cdpam': 0,
    'coord_multi': 1,
    'dataset_name': 'SPEECHCOMMANDS',
    'dataset_size': 128,
    'deriv_per_sample': 1,
    'double': 0,
    'eval_every': 5000,
    'eval_samples': 1,
    'eval_upscale_ratio': 1,
    'first_omega_0': 3000,
    'hidden_omega_0': 30,
    'input_dim': 1,
    'latent_descent_steps': 1,
    'latent_init_std': 0.001,
    'latent_lr': 0.3,
    'lr': 1e-05,
    'max_high_res_batch_size': 16,
    'meta_architecture': 'autodecoder',
    'multiscale_STFT': 0,
    'note': 'default',
    'note_general': 'default',
    'num_epochs': 10001,
    'num_groups': 0,
    'num_latent': 256,
    'output_dim': 1,
    'per_sample': 1,
    'prog_weight_decay_every': 0,
    'prog_weight_decay_factor': 0,
    'sample_even': 1,
    'samples_per_datapoint': 2000,
    'save_audio': 1,
    'save_audio_plots': 0,
    'save_latents': 1,
    'save_model': 1,
    'save_path': 'results/default/SPEECHCOMMANDS/pi-gan_wide/autodecoder',
    'use_gpu': 1,
    'use_multi_gpu': 0,
    'wandb': 0,
    'wandb_project_name': 'neurips',
    'weight_decay': 0,
    'weight_norm': 0}
activations: ['sine', 'sine', 'none']
init_methods: [{'weights': 'siren_first', 'bias': 'polar'}, {'weights': 'siren', 'bias': 'polar'}, {'weights': 'siren_omega', 'omega': 30, 'bias': 'none'}]
layer 0: Film conditioned
layer 1: Film conditioned
layer 2: Film conditioned
layer 3: Film conditioned
piGAN_custom(
  (film_mapping_net): PiGANMappingNetwork(
    (net): Sequential(
      (0): Linear(in_features=256, out_features=256, bias=True)
      (1): LeakyReLU(negative_slope=0.2, inplace=True)
      (2): Linear(in_features=256, out_features=256, bias=True)
      (3): LeakyReLU(negative_slope=0.2, inplace=True)
      (4): Linear(in_features=256, out_features=256, bias=True)
      (5): LeakyReLU(negative_slope=0.2, inplace=True)
      (6): Linear(in_features=256, out_features=730, bias=True)
    )
  )
  (net): Sequential(
    (0): ImplicitMLPLayer(
      (linear): Linear(in_features=1, out_features=365, bias=True)
    )
    (1): ImplicitMLPLayer(
      (linear): Linear(in_features=365, out_features=365, bias=True)
    )
    (2): ImplicitMLPLayer(
      (linear): Linear(in_features=365, out_features=365, bias=True)
    )
    (3): ImplicitMLPLayer(
      (linear): Linear(in_features=365, out_features=365, bias=True)
    )
    (4): ImplicitMLPLayer(
      (linear): Linear(in_features=365, out_features=1, bias=True)
    )
  )
)
Number of parameters: 786852
Random Seed:  0
~/Desktop/phd/continuous-audio-representations/objective.py:11: UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end).
  self.finite_diff_derivative = torch.range(-1,1,2).unsqueeze(0).unsqueeze(0).to(device)
Seeing  1 GPUs
Starting run for 10001 epochs..
Traceback (most recent call last):
  File "train.py", line 358, in <module>
    train(model, optim_INR, optim_mapping, scheduler, train_loader, config)
  File "train.py", line 80, in train
    g = model(sampled_coords, z=z)
  File "/home/kavi/.virtualenvs/pcinr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/kavi/.virtualenvs/pcinr/lib/python3.8/site-packages/INR_collection/modules.py", line 487, in forward
    concat = concat.repeat(1, coordinates.shape[1], 1)
AttributeError: 'NoneType' object has no attribute 'repeat'

Any idea why this might be? Thank you.

jayathungek avatar Apr 22 '22 10:04 jayathungek