audiocraft
audiocraft copied to clipboard
Training Help - Error opening file ... : RuntimeError('cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous')
Hi, I'm trying to train a new model from scratch via musicgen on a new dataset. I'm finding that the docs are quite difficult to follow.
Please see my folder structure and approach:
The command to initiate training
!cd "audiocraft" && dora run -d solver=musicgen/musicgen_base_test_1 dset=audio/test_1
I have configured my custom solver config/solver/musicgen/musicgen_base_test_1.yaml
# @package __global__
# This is the training loop solver
# for the base MusicGen model (text-to-music)
# on monophonic audio sampled at 32 kHz
defaults:
- musicgen/default
- /model: lm/musicgen_lm
- override /dset: audio/default
- _self_
autocast: true
autocast_dtype: float16
# EnCodec large trained on mono-channel music audio sampled at 32khz
# with a total stride of 640 leading to 50 frames/s.
# rvq.n_q=4, rvq.bins=2048, no quantization dropout
# (transformer_lm card and n_q must be compatible)
compression_model_checkpoint: //pretrained/facebook/encodec_32khz
channels: 1
sample_rate: 32000
deadlock:
use: true # deadlock detection
dataset:
batch_size: 4 # 32 GPUs
sample_on_weight: false # Uniform sampling all the way
sample_on_duration: false # Uniform sampling all the way
generate:
lm:
use_sampling: true
top_k: 250
top_p: 0.0
optim:
epochs: 5
optimizer: dadam
lr: 1
ema:
use: true
updates: 10
device: cuda
logging:
log_tensorboard: true
schedule:
lr_scheduler: cosine
cosine:
warmup: 2000
lr_min_ratio: 0.0
cycle_length: 1.0
I have my dset config dset/test_1.yaml
# @package __global__
datasource:
max_sample_rate: 48000
max_channels: 2
train: egs/test_1/train
valid: egs/test_1/test
evaluate: egs/test_1/train
generate: egs/test_1/test
and finally, I have my dataset and data:
dataset/test_1/train/data.jsonl
dataset/test_1/test/data.jsonl
both of these look like this:
{"path": "dataset/test_1/115775.mp3", "duration": 181, "sample_rate": 48000, "amplitude": null, "weight": null, "info_path": null}
...
...
each audio file has a 'manifest' file in the form:
{"key": "A#", "artist": "Alec K. Redfearn & the Eyesores", "sample_rate": 44100, "file_extension": "mp3", "description": "Folk", "keywords": ["Folk"], "duration": 182, "bpm": 103, "genre": "Folk", "title": "Ohio", "name": "Ohio", "instrument": "mix", "moods": ["Folk"]}
I can adjust this as needed, but I want to get training working before I mess with parameters.
Everything runs fine then hits an error:
Error opening file ... : RuntimeError('cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous')
Thanks in advance for any help with this particular issue and would appreciate any general tips for something else I might be doing wrong. I really want to get a working model that isn't restricted by the license