whisper-vits-japanese icon indicating copy to clipboard operation
whisper-vits-japanese copied to clipboard

断点继续训练时出错

Open Hyatt-L opened this issue 1 year ago • 2 comments

这是部分日志图,难道是因为第一次训练还没有完全结束的原因吗?

[INFO] {'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 800, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 24, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/train_filelist.txt.cleaned', 'validation_files': 'filelists/val_filelist.txt.cleaned', 'text_cleaners': ['japanese_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0, 'cleaned_text': True}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs/isla_base'} 2023-04-14 13:21:23.301526: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-04-14 13:21:24.298365: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:563: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( ./logs/isla_base/G_0.pth [INFO] Loaded checkpoint './logs/isla_base/G_0.pth' (iteration 1) ./logs/isla_base/D_0.pth [INFO] Loaded checkpoint './logs/isla_base/D_0.pth' (iteration 1) /usr/local/lib/python3.9/dist-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:803.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /usr/local/lib/python3.9/dist-packages/torch/functional.py:606: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ../aten/src/ATen/EmptyTensor.cpp:32.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /usr/local/lib/python3.9/dist-packages/torch/autograd/init.py:173: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. grad.sizes() = [1, 9, 96], strides() = [51936, 96, 1] bucket_view.sizes() = [1, 9, 96], strides() = [864, 96, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:326.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass [INFO] Train Epoch: 1 [0%] [INFO] [6.065904140472412, 6.065133094787598, 0.47868022322654724, 108.19261169433594, 1.6783794164657593, 228.80638122558594, 0, 0.0002] /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:563: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( [INFO] Saving model and optimizer state at iteration 1 to ./logs/isla_base/G_0.pth [INFO] Saving model and optimizer state at iteration 1 to ./logs/isla_base/D_0.pth [INFO] ====> Epoch: 1 [INFO] ====> Epoch: 2

Hyatt-L avatar Apr 14 '23 13:04 Hyatt-L