TensorFlowASR
TensorFlowASR copied to clipboard
Testing contextnet is extremely slow in the refactored main branch
Hello, firstly, thank you for solving the beam issue. Now running the test.py with the trained model, the ETA time is huge. Its around 5 hours. Previously it got completed within 30 minutes. I am running on V100 GPU
@vaibhav016 The beamsearch is not fully optimized so it will take a long time to finish. But I believe when the wer reaches below 10% then the difference between greedy and beamsearch is insignificant.
@usimarit Okay, thank you for this info. But how should we obtain the metrics given in contextnet paper. Do we have to do some modification in config file?
@vaibhav016 We need to train and test more cases, currently even conformer haven't reach the wer of paper
@usimarit Okay. Thanks a lot for the support.
@vaibhav016 If you use greedy only, do you still meet this problem?
@usimarit Unfortunately yes. I trained my model on only train-clean-100, and tested it. Its taking approximately 4 hours to complete it.
@vaibhav016 That's weird, for cpu only, I tested on my machine with 8 cores, it only took around 20 mins for conformer (on librispeech test-clean) with batch size 1
@usimarit Give me a days time, I will pull the latest branch and check again. Will update you soon on this.
HI. I am facing the same problem. the contextnet test.py
takes forever. I am wondering if I need to make a modification or to use any specific configuration? thanks for this great project.
Tensorflow 2.8.0 (on GPU and CPU) Python 3.8 pip from source (master)
Config file:
speech_config:
sample_rate: 16000
frame_ms: 25
stride_ms: 10
num_feature_bins: 80
feature_type: log_mel_spectrogram
preemphasis: 0.97
normalize_signal: True
normalize_feature: True
normalize_per_frame: False
decoder_config:
vocabulary: /main/datasets/libra/train/train.subwords
# vocabulary: null
target_vocab_size: 1024
max_subword_length: 10
blank_at_zero: True
beam_width: 5
norm_score: True
model_config:
name: contextnet
encoder_alpha: 0.5
encoder_blocks: ...
prediction_embed_dim: 640
prediction_embed_dropout: 0
prediction_num_rnns: 1
prediction_rnn_units: 640
prediction_rnn_type: lstm
prediction_rnn_implementation: 1
prediction_layer_norm: True
prediction_projection_units: 0
joint_dim: 640
joint_activation: tanh
learning_config:
train_dataset_config:
use_tf: True
augmentation_config:
feature_augment:
time_masking:
num_masks: 10
mask_factor: 100
p_upperbound: 0.05
freq_masking:
num_masks: 1
mask_factor: 27
data_paths:
- /main/datasets/libra/train/train.tsv
tfrecords_dir: null
shuffle: True
cache: True
buffer_size: 100
drop_remainder: True
stage: train
eval_dataset_config:
use_tf: True
data_paths:
- /main/datasets/libra/dev/dev.tsv
tfrecords_dir: null
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: eval
test_dataset_config:
use_tf: True
data_paths:
- /main/datasets/libra/test/test.tsv
tfrecords_dir: null
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: test
optimizer_config:
warmup_steps: 40000
beta_1: 0.9
beta_2: 0.98
epsilon: 1e-9
running_config:
batch_size: 4 # 2
num_epochs: 20
checkpoint:
filepath: /main/models/local/contextnet/libra/checkpoints/{epoch:02d}.h5
save_best_only: False
save_weights_only: True
save_freq: epoch
states_dir: /main/models/local/contextnet/libra/states
tensorboard:
log_dir: /main/models/local/contextnet/libra/tensorboard
histogram_freq: 1
write_graph: True
write_images: True
update_freq: 100 # epoch
profile_batch: 2
@HesNobi Did you meet this issue on the up-to-date main branch?
@HesNobi @nglehuy Any suggestions on how to fix this? I am facing the same issue, the test.py is extremely slow. It takes 1 hour to run inference on 8 data points. I am on the main branch and the training went through without issues. Here is my profile config, any tips would be appreciated:
# Copyright 2020 Huy Le Nguyen (@usimarit)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
speech_config:
sample_rate: 16000
frame_ms: 24
stride_ms: 16
num_feature_bins: 40
feature_type: log_mel_spectrogram
preemphasis: 0.97
normalize_signal: True
normalize_feature: True
normalize_per_frame: False
decoder_config:
# vocabulary: ./vocabularies/librispeech/librispeech_train_4_1030.subwords
vocabulary: TensorFlowASR/vocabularies/librispeech/librispeech_train_4_1030.subwords
target_vocab_size: 1024
max_subword_length: 4
blank_at_zero: True
beam_width: 5
norm_score: True
model_config:
name: contextnet
encoder_alpha: 0.15
encoder_blocks:
# C0
- nlayers: 1
kernel_size: 5
filters: 256
strides: 1
residual: False
activation: silu
# C1-C2
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
# C3
- nlayers: 5
kernel_size: 5
filters: 256
strides: 2
residual: True
activation: silu
# C4-C6
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
# C7
- nlayers: 5
kernel_size: 5
filters: 256
strides: 2
residual: True
activation: silu
# C8 - C10
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
# C11 - C13
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
# C14
- nlayers: 5
kernel_size: 5
filters: 512
strides: 2
residual: True
activation: silu
# C15 - C21
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
# C22
- nlayers: 1
kernel_size: 5
filters: 640
strides: 1
residual: False
activation: silu
prediction_embed_dim: 640
prediction_embed_dropout: 0
prediction_num_rnns: 1
prediction_rnn_units: 640
prediction_rnn_type: lstm
prediction_rnn_implementation: 1
prediction_layer_norm: True
prediction_projection_units: 0
joint_dim: 640
joint_activation: tanh
learning_config:
train_dataset_config:
use_tf: True
augmentation_config:
feature_augment:
time_masking:
num_masks: 10
mask_factor: 100
p_upperbound: 0.05
freq_masking:
num_masks: 1
mask_factor: 27
data_paths:
- /LibriSpeech/LibriSpeech/train-clean-100/transcripts.tsv
tfrecords_dir: null
shuffle: True
cache: True
buffer_size: 100
drop_remainder: True
stage: train
eval_dataset_config:
use_tf: True
data_paths:
- /LibriSpeech/LibriSpeech/dev-clean/transcripts.tsv
tfrecords_dir: null
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: eval
test_dataset_config:
use_tf: True
data_paths:
- /LibriSpeech/LibriSpeech/test-clean/transcripts_short.tsv
tfrecords_dir: null
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: test
optimizer_config:
warmup_steps: 40000
beta_1: 0.9
beta_2: 0.98
epsilon: 1e-9
running_config:
batch_size: 2
num_epochs: 20
checkpoint:
# filepath: D:/Models/local/contextnet/checkpoints/{epoch:02d}.h5
filepath: TensorFlowASR/examples/contextnet/checkpoints/{epoch:02d}.h5
save_best_only: False
save_weights_only: True
save_freq: epoch
states_dir: TensorFlowASR/examples/contextnet/states
tensorboard:
log_dir: TensorFlowASR/examples/contextnet/tensorboard
histogram_freq: 1
write_graph: True
write_images: True
update_freq: epoch
profile_batch: 2