audiocraft icon indicating copy to clipboard operation
audiocraft copied to clipboard

Training Help - Error opening file ... : RuntimeError('cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous')

Open tomhaydn opened this issue 9 months ago • 3 comments

Hi, I'm trying to train a new model from scratch via musicgen on a new dataset. I'm finding that the docs are quite difficult to follow.

Please see my folder structure and approach:

The command to initiate training !cd "audiocraft" && dora run -d solver=musicgen/musicgen_base_test_1 dset=audio/test_1

I have configured my custom solver config/solver/musicgen/musicgen_base_test_1.yaml

# @package __global__

# This is the training loop solver
# for the base MusicGen model (text-to-music)
# on monophonic audio sampled at 32 kHz
defaults:
  - musicgen/default
  - /model: lm/musicgen_lm
  - override /dset: audio/default
  - _self_

autocast: true
autocast_dtype: float16

# EnCodec large trained on mono-channel music audio sampled at 32khz
# with a total stride of 640 leading to 50 frames/s.
# rvq.n_q=4, rvq.bins=2048, no quantization dropout
# (transformer_lm card and n_q must be compatible)
compression_model_checkpoint: //pretrained/facebook/encodec_32khz

channels: 1
sample_rate: 32000

deadlock:
  use: true  # deadlock detection

dataset:
  batch_size: 4 # 32 GPUs
  sample_on_weight: false  # Uniform sampling all the way
  sample_on_duration: false  # Uniform sampling all the way

generate:
  lm:
    use_sampling: true
    top_k: 250
    top_p: 0.0

optim:
  epochs: 5
  optimizer: dadam
  lr: 1
  ema:
    use: true
    updates: 10
    device: cuda

logging:
  log_tensorboard: true

schedule:
  lr_scheduler: cosine
  cosine:
    warmup: 2000
    lr_min_ratio: 0.0
    cycle_length: 1.0

I have my dset config dset/test_1.yaml

# @package __global__

datasource:
  max_sample_rate: 48000
  max_channels: 2

  train: egs/test_1/train
  valid: egs/test_1/test
  evaluate: egs/test_1/train
  generate: egs/test_1/test

and finally, I have my dataset and data:

dataset/test_1/train/data.jsonl dataset/test_1/test/data.jsonl

both of these look like this:

{"path": "dataset/test_1/115775.mp3", "duration": 181, "sample_rate": 48000, "amplitude": null, "weight": null, "info_path": null}
...
...

each audio file has a 'manifest' file in the form: {"key": "A#", "artist": "Alec K. Redfearn & the Eyesores", "sample_rate": 44100, "file_extension": "mp3", "description": "Folk", "keywords": ["Folk"], "duration": 182, "bpm": 103, "genre": "Folk", "title": "Ohio", "name": "Ohio", "instrument": "mix", "moods": ["Folk"]}

I can adjust this as needed, but I want to get training working before I mess with parameters.

Everything runs fine then hits an error:

Error opening file ... : RuntimeError('cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous')

Thanks in advance for any help with this particular issue and would appreciate any general tips for something else I might be doing wrong. I really want to get a working model that isn't restricted by the license

tomhaydn avatar May 24 '24 10:05 tomhaydn