muzic
muzic copied to clipboard
[ROC] Mistakes in training
When I trained the model, the error "FileNotFoundError: Dataset not found: valid (data/lmd_processed/valid)" appeared. How can i solve the problem?
Please (1) check whether 'data/lmd_processed/valid.notes' exists and (2) the training script is for fairseq 0.10.1, if you are using a higher version, check whether the command arguments change, e.g., the source or target need to be specified.
@trestad Another mistake of traing.
fairseq-train data/lmd_processed/
--arch transformer_lm --task language_modeling
--decoder-attention-heads 4 --decoder-embed-dim 256
--decoder-input-dim 256 --decoder-output-dim 256
--decoder-layers 4 --update-freq 1 --optimizer adam
--adam-betas '(0.9, 0.98)' --adam-eps 1e-6 --clip-norm 0.0
--criterion label_smoothed_cross_entropy --label-smoothing 0.1
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07
--warmup-updates 4000 --lr 0.0001 --attention-dropout 0.1
--dropout 0.1 --weight-decay 0.01 --max-update 50000
--save-dir music-ckps2 --batch-size 1 --max-target-positions 512
--log-interval 100 --patience 20 --no-epoch-checkpoints
--best-checkpoint-metric 'ppl' | tee music-ckps/log.txt
2023-01-20 02:46:22 | WARNING | fairseq.tasks.fairseq_task | 18505 samples have invalid sizes and will be skipped, max_positions=512, first few sample ids=[18504, 4243, 11102, 10387, 11829, 27, 2933, 1156, 6782, 14445]
Traceback (most recent call last):
File "/opt/conda/envs/muzic/bin/fairseq-train", line 8, in
According to your warning ''18505 samples have invalid sizes and will be skipped, max_positions=512", I guess that your data is so long that all of them are skipped. Perhaps you should use a larger 'max-target-position' like the excepption suggests: 'Exception: The dataset is empty. This could indicate that all elements in the dataset have been skipped. Try increasing the max number of allowed tokens or using a larger dataset.'