diff-svc
diff-svc copied to clipboard
Running for the first time and i got this error
D:\AI\diff-svc>python preprocessing/binarize.py --config training/config_nsf.yaml
| Hparams chains: ['training/config_nsf.yaml']
| Hparams:
K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 128, audio_sample_rate: 44100, binarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False},
binarizer_cls: preprocessing.SVCpre.SVCBinarizer, binary_data_dir: data/binary/nseebmytalk, check_val_every_n_epoch: 10, choose_test_manually: False, clip_grad_norm: 1,
config_path: training/config_nsf.yaml, content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2,
cwt_loss: l1, cwt_std_scale: 0.8, datasets: ['opencpop'], debug: False, dec_ffn_kernel_size: 9,
dec_layers: 4, decay_steps: 40000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet,
diff_loss_type: l2, dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'],
dur_loss: mse, dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4,
encoder_K: 8, encoder_type: fft, endless_ds: False, f0_bin: 256, f0_max: 1100.0,
f0_min: 40.0, ffn_act: gelu, ffn_padding: SAME, fft_size: 2048, fmax: 16000,
fmin: 40, fs2_ckpt: , gaussian_start: True, gen_dir_name: , gen_tgt_spk_id: -1,
hidden_size: 256, hop_size: 512, hubert_gpu: True, hubert_path: checkpoints/hubert/hubert_soft.pt, infer: False,
keep_bins: 128, lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 0.3,
lambda_sent_dur: 1.0, lambda_uv: 1.0, lambda_word_dur: 1.0, load_ckpt: D:\AI\diff-svc\checkpoints\nsf_hifigan, log_interval: 100,
loud_norm: False, lr: 0.0008, max_beta: 0.02, max_epochs: 3000, max_eval_sentences: 1,
max_eval_tokens: 60000, max_frames: 42000, max_input_tokens: 60000, max_sentences: 88, max_tokens: 128000,
max_updates: 1000000, mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120,
no_fs2: True, norm_type: gn, num_ckpt_keep: 10, num_heads: 2, num_sanity_val_steps: 1,
num_spk: 1, num_test_samples: 0, num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98,
out_wav_norm: False, pe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, pe_enable: False, perform_enhance: True, pitch_ar: False,
pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l2, pitch_norm: log, pitch_type: frame,
pndm_speedup: 10, pre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1,
predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256,
pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/nseebmytalk, ref_norm_layer: bn,
rel_pos: True, reset_phone_dict: True, residual_channels: 384, residual_layers: 20, save_best: False,
save_ckpt: True, save_codes: ['configs', 'modules', 'src', 'utils'], save_f0: True, save_gt: False, schedule_type: linear,
seed: 1234, sort_by_len: True, speaker_id: nseebmytalk, spec_max: [0.0], spec_min: [-5.0],
spk_cond_steps: [], stop_token_weight: 5.0, task_cls: training.task.SVC_task.SVCTask, test_ids: [], test_input_dir: ,
test_num: 0, test_prefixes: ['test'], test_set_name: test, timesteps: 1000, train_set_name: train,
use_crepe: True, use_denoise: False, use_energy_embed: False, use_gt_dur: False, use_gt_f0: False,
use_midi: False, use_nsf: True, use_pitch_embed: True, use_pos_embed: True, use_spk_embed: False,
use_spk_id: False, use_split_spk_id: False, use_uv: False, use_var_enc: False, use_vec: False,
val_check_interval: 2000, valid_num: 0, valid_set_name: valid, validate: False, vocoder: network.vocoders.nsf_hifigan.NsfHifiGAN,
vocoder_ckpt: checkpoints/nsf_hifigan/model, warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 2048,
work_dir: ,
| Binarizer: <class 'preprocessing.SVCpre.SVCBinarizer'>
spkers: {'nseebmytalk'}
| spk_map: {'nseebmytalk': 0}
0%| | 0/5 [00:01<?, ?it/s]
Traceback (most recent call last):
File "D:\AI\diff-svc\preprocessing\binarize.py", line 20, in
i am new to this, i don't know where i went wrong and how to fix this?