openspeech
openspeech copied to clipboard
학습이 되지 않습니다.
❓ Questions & Help
가상 환경 setup 완료 후, 학습을 시작했는데 학습이 제대로 되지 않습니다. Librispeech, Kospeech에 대해 character, subword, grapheme 전부 시도해보았는데 loss가 떨어지지 않습니다. 오류가 발생하거나 하지 않아 원인을 모르겠습니다. 아래에 실행 코드 및 환경 정보를 적어두겠습니다. 감사합니다.
Details
실행 코드
ASR_DATASET=ksponspeech
DATASET_PATH="DB_folder"
TEST_DATASET_PATH="TestDB_folder"
TEST_MANIFEST_DIR="Scripts_folder"
ASR_TOKENIZER=kspon_subword
MANIFEST_FILE_PATH="${DATASET_PATH}/train_sub.txt"
SP_MODEL_PATH="${DATASET_PATH}/labels_sub.model"
CUDA_VISIBLE_DEVICES=1 HYDRA_FULL_ERROR=1 python3 ./openspeech_cli/hydra_train.py \
dataset=${ASR_DATASET} \
+dataset.dataset_download=False \
dataset.dataset_path=${DATASET_PATH} \
dataset.manifest_file_path=${MANIFEST_FILE_PATH} \
dataset.test_dataset_path=${TEST_DATASET_PATH} \
dataset.test_manifest_dir=${TEST_MANIFEST_DIR} \
tokenizer=${ASR_TOKENIZER} \
model=${ASR_MODEL} \
audio=${ASR_AUDIO} \
lr_scheduler=warmup_reduce_lr_on_plateau \
trainer=gpu \
trainer.batch_size=4 \
trainer.auto_scale_batch_size=False \
+save_checkpoint_n_steps=10000 \
criterion=ctc \
tokenizer.sp_model_path=${SP_MODEL_PATH}
# tokenizer.vocab_path=${VOCAB_FILE_PATH}
모델 정보
audio:
name: fbank
sample_rate: 16000
frame_length: 20.0
frame_shift: 10.0
del_silence: false
num_mels: 80
apply_spec_augment: true
apply_noise_augment: false
apply_time_stretch_augment: false
apply_joining_augment: false
augment:
apply_spec_augment: false
apply_noise_augment: false
apply_joining_augment: false
apply_time_stretch_augment: false
freq_mask_para: 27
freq_mask_num: 2
time_mask_num: 4
noise_dataset_dir: None
noise_level: 0.7
time_stretch_min_rate: 0.7
time_stretch_max_rate: 1.4
dataset:
dataset: ksponspeech
dataset_path: /ssd1/DB/KsponSpeech
test_dataset_path: /ssd1/DB
manifest_file_path: /ssd1/DB/KsponSpeech/train_sub.txt
test_manifest_dir: /ssd1/DB/KsponSpeech_scripts
preprocess_mode: phonetic
dataset_download: false
criterion:
criterion_name: ctc
reduction: mean
zero_infinity: true
lr_scheduler:
lr: 0.0001
scheduler_name: warmup_reduce_lr_on_plateau
lr_patience: 1
lr_factor: 0.3
peak_lr: 0.0001
init_lr: 1.0e-10
warmup_steps: 4000
model:
model_name: conformer
encoder_dim: 512
num_encoder_layers: 17
num_attention_heads: 8
feed_forward_expansion_factor: 4
conv_expansion_factor: 2
input_dropout_p: 0.1
feed_forward_dropout_p: 0.1
attention_dropout_p: 0.1
conv_dropout_p: 0.1
conv_kernel_size: 31
half_step_residual: true
optimizer: adam
trainer:
seed: 1
accelerator: dp
accumulate_grad_batches: 1
num_workers: 4
batch_size: 4
check_val_every_n_epoch: 1
gradient_clip_val: 5.0
logger: wandb
max_epochs: 20
save_checkpoint_n_steps: 10000
auto_scale_batch_size: 'False'
sampler: else
name: gpu
device: gpu
use_cuda: true
auto_select_gpus: true
tokenizer:
sos_token: <s>
eos_token: </s>
pad_token: <pad>
blank_token: <blank>
encoding: utf-8
unit: kspon_subword
sp_model_path: /ssd1/DB/KsponSpeech/labels_sub.model
vocab_size: 3200
save_checkpoint_n_steps: 10000
Global seed set to 1
[2024-01-31 14:01:57,940][openspeech.utils][INFO] - audio:
name: fbank
sample_rate: 16000
frame_length: 20.0
frame_shift: 10.0
del_silence: false
num_mels: 80
apply_spec_augment: true
apply_noise_augment: false
apply_time_stretch_augment: false
apply_joining_augment: false
augment:
apply_spec_augment: false
apply_noise_augment: false
apply_joining_augment: false
apply_time_stretch_augment: false
freq_mask_para: 27
freq_mask_num: 2
time_mask_num: 4
noise_dataset_dir: None
noise_level: 0.7
time_stretch_min_rate: 0.7
time_stretch_max_rate: 1.4
dataset:
dataset: ksponspeech
dataset_path: /ssd1/DB/KsponSpeech
test_dataset_path: /ssd1/DB
manifest_file_path: /ssd1/DB/KsponSpeech/train_sub.txt
test_manifest_dir: /ssd1/DB/KsponSpeech_scripts
preprocess_mode: phonetic
dataset_download: false
criterion:
criterion_name: ctc
reduction: mean
zero_infinity: true
lr_scheduler:
lr: 0.0001
scheduler_name: warmup_reduce_lr_on_plateau
lr_patience: 1
lr_factor: 0.3
peak_lr: 0.0001
init_lr: 1.0e-10
warmup_steps: 4000
model:
model_name: conformer
encoder_dim: 512
num_encoder_layers: 17
num_attention_heads: 8
feed_forward_expansion_factor: 4
conv_expansion_factor: 2
input_dropout_p: 0.1
feed_forward_dropout_p: 0.1
attention_dropout_p: 0.1
conv_dropout_p: 0.1
conv_kernel_size: 31
half_step_residual: true
optimizer: adam
trainer:
seed: 1
accelerator: dp
accumulate_grad_batches: 1
num_workers: 4
batch_size: 4
check_val_every_n_epoch: 1
gradient_clip_val: 5.0
logger: wandb
max_epochs: 20
save_checkpoint_n_steps: 10000
auto_scale_batch_size: 'False'
sampler: else
name: gpu
device: gpu
use_cuda: true
auto_select_gpus: true
tokenizer:
sos_token: <s>
eos_token: </s>
pad_token: <pad>
blank_token: <blank>
encoding: utf-8
unit: kspon_subword
sp_model_path: /ssd1/DB/KsponSpeech/labels_sub.model
vocab_size: 3200
save_checkpoint_n_steps: 10000
[2024-01-31 14:01:57,957][openspeech.utils][INFO] - Operating System : Linux 6.5.0-15-generic
[2024-01-31 14:01:57,957][openspeech.utils][INFO] - Processor : x86_64
[2024-01-31 14:01:57,957][openspeech.utils][INFO] - device : NVIDIA GeForce RTX 4090
[2024-01-31 14:01:57,958][openspeech.utils][INFO] - CUDA is available : True
[2024-01-31 14:01:57,958][openspeech.utils][INFO] - CUDA version : 11.1
[2024-01-31 14:01:57,958][openspeech.utils][INFO] - PyTorch version : 1.8.1+cu111
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
conda list
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 2.1.0 pypi_0 pypi
aiohttp 3.9.1 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
antlr4-python3-runtime 4.9.3 pypi_0 pypi
appdirs 1.4.4 pypi_0 pypi
astropy 5.2.2 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
audioread 3.0.1 pypi_0 pypi
ca-certificates 2023.11.17 hbcca054_0 conda-forge
cachetools 5.3.2 pypi_0 pypi
certifi 2023.11.17 pypi_0 pypi
cffi 1.16.0 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
click 8.1.7 pypi_0 pypi
ctcdecode 1.0.3 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
docker-pycreds 0.4.0 pypi_0 pypi
filelock 3.13.1 pypi_0 pypi
frozenlist 1.4.1 pypi_0 pypi
fsspec 2023.12.2 pypi_0 pypi
future 0.18.3 pypi_0 pypi
gitdb 4.0.11 pypi_0 pypi
gitpython 3.1.41 pypi_0 pypi
google-auth 2.27.0 pypi_0 pypi
google-auth-oauthlib 1.0.0 pypi_0 pypi
grpcio 1.60.0 pypi_0 pypi
hydra-core 1.3.2 pypi_0 pypi
idna 3.6 pypi_0 pypi
importlib-metadata 7.0.1 pypi_0 pypi
importlib-resources 6.1.1 pypi_0 pypi
jinja2 3.1.3 pypi_0 pypi
joblib 1.3.2 pypi_0 pypi
lazy-loader 0.3 pypi_0 pypi
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
levenshtein 0.23.0 pypi_0 pypi
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 13.2.0 h807b86a_3 conda-forge
libgomp 13.2.0 h807b86a_3 conda-forge
librosa 0.9.2 pypi_0 pypi
libsqlite 3.44.2 h2797004_0 conda-forge
libstdcxx-ng 13.2.0 h7e041cc_3 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
lightning-utilities 0.10.1 pypi_0 pypi
llvmlite 0.41.1 pypi_0 pypi
markdown 3.5.2 pypi_0 pypi
markupsafe 2.1.4 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.0.7 pypi_0 pypi
multidict 6.0.4 pypi_0 pypi
ncurses 6.4 h59595ed_2 conda-forge
networkx 3.1 pypi_0 pypi
numba 0.58.1 pypi_0 pypi
numpy 1.24.4 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.18.1 pypi_0 pypi
nvidia-nvjitlink-cu12 12.3.101 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
omegaconf 2.3.0 pypi_0 pypi
openspeech-core 0.4.0 dev_0 <develop>
openssl 1.1.1w hd590300_0 conda-forge
packaging 23.2 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
pillow 10.2.0 pypi_0 pypi
pip 23.3.2 pyhd8ed1ab_0 conda-forge
platformdirs 4.1.0 pypi_0 pypi
pooch 1.8.0 pypi_0 pypi
protobuf 4.25.2 pypi_0 pypi
psutil 5.9.8 pypi_0 pypi
pyasn1 0.5.1 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
pybind11 2.11.1 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pydeprecate 0.3.1 pypi_0 pypi
pyerfa 2.0.0.3 pypi_0 pypi
python 3.8.0 h357f687_5 conda-forge
python-dateutil 2.8.2 pypi_0 pypi
python-levenshtein 0.23.0 pypi_0 pypi
pytorch-lightning 1.4.0 pypi_0 pypi
pytz 2023.3.post1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
rapidfuzz 3.6.1 pypi_0 pypi
readline 8.2 h8228510_1 conda-forge
requests 2.31.0 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
resampy 0.4.2 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scikit-learn 1.3.2 pypi_0 pypi
scipy 1.10.1 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
sentry-sdk 1.39.2 pypi_0 pypi
setproctitle 1.3.3 pypi_0 pypi
setuptools 69.0.3 pyhd8ed1ab_0 conda-forge
six 1.16.0 pypi_0 pypi
smmap 5.0.1 pypi_0 pypi
soundfile 0.12.1 pypi_0 pypi
soxr 0.3.7 pypi_0 pypi
sqlite 3.44.2 h2c6b66d_0 conda-forge
sympy 1.12 pypi_0 pypi
tensorboard 2.14.0 pypi_0 pypi
tensorboard-data-server 0.7.2 pypi_0 pypi
threadpoolctl 3.2.0 pypi_0 pypi
tk 8.6.13 noxft_h4845f30_101 conda-forge
torch 1.8.1+cu111 pypi_0 pypi
torchaudio 0.8.1 pypi_0 pypi
torchmetrics 0.4.0 pypi_0 pypi
torchvision 0.9.1+cu111 pypi_0 pypi
tqdm 4.66.1 pypi_0 pypi
triton 2.1.0 pypi_0 pypi
typing-extensions 4.9.0 pypi_0 pypi
tzdata 2023.4 pypi_0 pypi
urllib3 2.1.0 pypi_0 pypi
wandb 0.16.2 pypi_0 pypi
warp-rnnt 0.4.0 pypi_0 pypi
werkzeug 3.0.1 pypi_0 pypi
wget 3.2 pypi_0 pypi
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yarl 1.9.4 pypi_0 pypi
zipp 3.17.0 pypi_0 pypi
zlib 1.2.13 hd590300_5 conda-forge
Dataset structure
dataset.dataset_path: $BASE_PATH/KsponSpeech
$BASE_PATH/KsponSpeech
├── KsponSpeech_01
├── KsponSpeech_02
├── KsponSpeech_03
├── KsponSpeech_04
├── KsponSpeech_05
└── etc... (cvs, model, txt...)
dataset.test_dataset_path: $BASE_PATH/KsponSpeech_eval
$BASE_PATH/KsponSpeech_eval
├── eval_clean
└── eval_other
dataset.test_manifest_dir: $BASE_PATH/KsponSpeech_scripts
$BASE_PATH/KsponSpeech_scripts
├── eval_clean.trn
└── eval_other.trn
감사합니다.
@Seoyoung-Jo 확인해보겠습니다. 감사합니다.