continuous-audio-representations
continuous-audio-representations copied to clipboard
Unable to train model on SPEECHCOMMANDS dataset
I've downloaded the speech_commands_v0.02 tar file and extracted it into the following directory structure:
data/SCNUMBERS1024
└── SpeechCommands
└── speech_commands_v0.02
├── _background_noise_
├── backward
├── bed
├── bird
├── cat
├── dog
├── down
├── eight
├── five
├── follow
├── forward
├── four
├── go
├── happy
├── house
├── learn
├── left
├── marvin
├── nine
├── no
├── off
├── on
├── one
├── right
├── seven
├── sheila
├── six
├── stop
├── three
├── tree
├── two
├── up
├── visual
├── wow
├── yes
└── zero
I then try to train the model on this dataset via:
$ python train.py --wandb 0 --architecture pi-gan_wide --dataset_name SPEECHCOMMANDS --dataset_size 128
but run into a NoneType error, which leads me to believe than the dataset is not initialised properly somehow. the full output of running the above command is below:
~/.virtualenvs/pcinr/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
warnings.warn(
1
{ 'architecture': 'pi-gan_wide',
'audio_length': 16000,
'autoconfig': 0,
'batch_size': 128,
'cdpam': 0,
'coord_multi': 1,
'dataset_name': 'SPEECHCOMMANDS',
'dataset_size': 128,
'deriv_per_sample': 1,
'double': 0,
'eval_every': 5000,
'eval_samples': 1,
'eval_upscale_ratio': 1,
'first_omega_0': 3000,
'hidden_omega_0': 30,
'input_dim': 1,
'latent_descent_steps': 1,
'latent_init_std': 0.001,
'latent_lr': 0.3,
'lr': 1e-05,
'max_high_res_batch_size': 16,
'meta_architecture': 'autodecoder',
'multiscale_STFT': 0,
'note': 'default',
'note_general': 'default',
'num_epochs': 10001,
'num_groups': 0,
'num_latent': 256,
'output_dim': 1,
'per_sample': 1,
'prog_weight_decay_every': 0,
'prog_weight_decay_factor': 0,
'sample_even': 1,
'samples_per_datapoint': 2000,
'save_audio': 1,
'save_audio_plots': 0,
'save_latents': 1,
'save_model': 1,
'save_path': 'results/default/SPEECHCOMMANDS/pi-gan_wide/autodecoder',
'use_gpu': 1,
'use_multi_gpu': 0,
'wandb': 0,
'wandb_project_name': 'neurips',
'weight_decay': 0,
'weight_norm': 0}
activations: ['sine', 'sine', 'none']
init_methods: [{'weights': 'siren_first', 'bias': 'polar'}, {'weights': 'siren', 'bias': 'polar'}, {'weights': 'siren_omega', 'omega': 30, 'bias': 'none'}]
layer 0: Film conditioned
layer 1: Film conditioned
layer 2: Film conditioned
layer 3: Film conditioned
piGAN_custom(
(film_mapping_net): PiGANMappingNetwork(
(net): Sequential(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Linear(in_features=256, out_features=256, bias=True)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
(4): Linear(in_features=256, out_features=256, bias=True)
(5): LeakyReLU(negative_slope=0.2, inplace=True)
(6): Linear(in_features=256, out_features=730, bias=True)
)
)
(net): Sequential(
(0): ImplicitMLPLayer(
(linear): Linear(in_features=1, out_features=365, bias=True)
)
(1): ImplicitMLPLayer(
(linear): Linear(in_features=365, out_features=365, bias=True)
)
(2): ImplicitMLPLayer(
(linear): Linear(in_features=365, out_features=365, bias=True)
)
(3): ImplicitMLPLayer(
(linear): Linear(in_features=365, out_features=365, bias=True)
)
(4): ImplicitMLPLayer(
(linear): Linear(in_features=365, out_features=1, bias=True)
)
)
)
Number of parameters: 786852
Random Seed: 0
~/Desktop/phd/continuous-audio-representations/objective.py:11: UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end).
self.finite_diff_derivative = torch.range(-1,1,2).unsqueeze(0).unsqueeze(0).to(device)
Seeing 1 GPUs
Starting run for 10001 epochs..
Traceback (most recent call last):
File "train.py", line 358, in <module>
train(model, optim_INR, optim_mapping, scheduler, train_loader, config)
File "train.py", line 80, in train
g = model(sampled_coords, z=z)
File "/home/kavi/.virtualenvs/pcinr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/kavi/.virtualenvs/pcinr/lib/python3.8/site-packages/INR_collection/modules.py", line 487, in forward
concat = concat.repeat(1, coordinates.shape[1], 1)
AttributeError: 'NoneType' object has no attribute 'repeat'
Any idea why this might be? Thank you.