TensorFlowASR icon indicating copy to clipboard operation
TensorFlowASR copied to clipboard

Testing contextnet is extremely slow in the refactored main branch

Open vaibhav016 opened this issue 3 years ago • 11 comments

Hello, firstly, thank you for solving the beam issue. Now running the test.py with the trained model, the ETA time is huge. Its around 5 hours. Previously it got completed within 30 minutes. I am running on V100 GPU

vaibhav016 avatar Apr 20 '21 08:04 vaibhav016

@vaibhav016 The beamsearch is not fully optimized so it will take a long time to finish. But I believe when the wer reaches below 10% then the difference between greedy and beamsearch is insignificant.

nglehuy avatar Apr 20 '21 12:04 nglehuy

@usimarit Okay, thank you for this info. But how should we obtain the metrics given in contextnet paper. Do we have to do some modification in config file?

vaibhav016 avatar Apr 20 '21 15:04 vaibhav016

@vaibhav016 We need to train and test more cases, currently even conformer haven't reach the wer of paper

nglehuy avatar Apr 20 '21 16:04 nglehuy

@usimarit Okay. Thanks a lot for the support.

vaibhav016 avatar Apr 20 '21 19:04 vaibhav016

@vaibhav016 If you use greedy only, do you still meet this problem?

nglehuy avatar Apr 28 '21 15:04 nglehuy

@usimarit Unfortunately yes. I trained my model on only train-clean-100, and tested it. Its taking approximately 4 hours to complete it.

vaibhav016 avatar May 01 '21 07:05 vaibhav016

@vaibhav016 That's weird, for cpu only, I tested on my machine with 8 cores, it only took around 20 mins for conformer (on librispeech test-clean) with batch size 1

nglehuy avatar May 01 '21 07:05 nglehuy

@usimarit Give me a days time, I will pull the latest branch and check again. Will update you soon on this.

vaibhav016 avatar May 01 '21 07:05 vaibhav016

HI. I am facing the same problem. the contextnet test.py takes forever. I am wondering if I need to make a modification or to use any specific configuration? thanks for this great project.

image

Tensorflow 2.8.0 (on GPU and CPU) Python 3.8 pip from source (master)

Config file:

speech_config:
  sample_rate: 16000
  frame_ms: 25
  stride_ms: 10
  num_feature_bins: 80
  feature_type: log_mel_spectrogram
  preemphasis: 0.97
  normalize_signal: True
  normalize_feature: True
  normalize_per_frame: False

decoder_config:
  vocabulary: /main/datasets/libra/train/train.subwords
  # vocabulary: null
  target_vocab_size: 1024
  max_subword_length:  10
  blank_at_zero: True
  beam_width: 5
  norm_score: True

model_config:
  name: contextnet
  encoder_alpha: 0.5
  encoder_blocks: ...
  prediction_embed_dim: 640
  prediction_embed_dropout: 0
  prediction_num_rnns: 1
  prediction_rnn_units: 640
  prediction_rnn_type: lstm
  prediction_rnn_implementation: 1
  prediction_layer_norm: True
  prediction_projection_units: 0
  joint_dim: 640
  joint_activation: tanh

learning_config:
  train_dataset_config:
    use_tf: True
    augmentation_config:
      feature_augment:
        time_masking:
          num_masks: 10
          mask_factor: 100
          p_upperbound: 0.05
        freq_masking:
          num_masks: 1
          mask_factor: 27
    data_paths:
      - /main/datasets/libra/train/train.tsv
    tfrecords_dir: null
    shuffle: True
    cache: True
    buffer_size: 100
    drop_remainder: True
    stage: train

  eval_dataset_config:
    use_tf: True
    data_paths:
      - /main/datasets/libra/dev/dev.tsv
    tfrecords_dir: null
    shuffle: False
    cache: True
    buffer_size: 100
    drop_remainder: True
    stage: eval

  test_dataset_config:
    use_tf: True
    data_paths:
      - /main/datasets/libra/test/test.tsv
    tfrecords_dir: null
    shuffle: False
    cache: True
    buffer_size: 100
    drop_remainder: True
    stage: test

  optimizer_config:
    warmup_steps: 40000
    beta_1: 0.9
    beta_2: 0.98
    epsilon: 1e-9

  running_config:
    batch_size: 4 # 2
    num_epochs: 20
    checkpoint:
      filepath: /main/models/local/contextnet/libra/checkpoints/{epoch:02d}.h5
      save_best_only: False
      save_weights_only: True
      save_freq: epoch
    states_dir: /main/models/local/contextnet/libra/states
    tensorboard:
      log_dir: /main/models/local/contextnet/libra/tensorboard
      histogram_freq: 1
      write_graph: True
      write_images: True
      update_freq: 100 # epoch
      profile_batch: 2

HesNobi avatar May 01 '22 05:05 HesNobi

@HesNobi Did you meet this issue on the up-to-date main branch?

nglehuy avatar Sep 02 '22 05:09 nglehuy

@HesNobi @nglehuy Any suggestions on how to fix this? I am facing the same issue, the test.py is extremely slow. It takes 1 hour to run inference on 8 data points. I am on the main branch and the training went through without issues. Here is my profile config, any tips would be appreciated:

# Copyright 2020 Huy Le Nguyen (@usimarit)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

speech_config:
  sample_rate: 16000
  frame_ms: 24
  stride_ms: 16
  num_feature_bins: 40
  feature_type: log_mel_spectrogram
  preemphasis: 0.97
  normalize_signal: True
  normalize_feature: True
  normalize_per_frame: False

decoder_config:
  # vocabulary: ./vocabularies/librispeech/librispeech_train_4_1030.subwords
  vocabulary: TensorFlowASR/vocabularies/librispeech/librispeech_train_4_1030.subwords
  target_vocab_size: 1024
  max_subword_length: 4
  blank_at_zero: True
  beam_width: 5
  norm_score: True

model_config:
  name: contextnet
  encoder_alpha: 0.15
  encoder_blocks:
    # C0
    - nlayers: 1
      kernel_size: 5
      filters: 256
      strides: 1
      residual: False
      activation: silu
    # C1-C2
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 1
      residual: True
      activation: silu
    # C3
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 2
      residual: True
      activation: silu
    # C4-C6
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 1
      residual: True
      activation: silu
    # C7
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 2
      residual: True
      activation: silu
    # C8 - C10
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 256
      strides: 1
      residual: True
      activation: silu
    # C11 - C13
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    # C14
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 2
      residual: True
      activation: silu
    # C15 - C21
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    - nlayers: 5
      kernel_size: 5
      filters: 512
      strides: 1
      residual: True
      activation: silu
    # C22
    - nlayers: 1
      kernel_size: 5
      filters: 640
      strides: 1
      residual: False
      activation: silu
  prediction_embed_dim: 640
  prediction_embed_dropout: 0
  prediction_num_rnns: 1
  prediction_rnn_units: 640
  prediction_rnn_type: lstm
  prediction_rnn_implementation: 1
  prediction_layer_norm: True
  prediction_projection_units: 0
  joint_dim: 640
  joint_activation: tanh

learning_config:
  train_dataset_config:
    use_tf: True
    augmentation_config:
      feature_augment:
        time_masking:
          num_masks: 10
          mask_factor: 100
          p_upperbound: 0.05
        freq_masking:
          num_masks: 1
          mask_factor: 27
    data_paths:
      - /LibriSpeech/LibriSpeech/train-clean-100/transcripts.tsv
    tfrecords_dir: null
    shuffle: True
    cache: True
    buffer_size: 100
    drop_remainder: True
    stage: train

  eval_dataset_config:
    use_tf: True
    data_paths:
      - /LibriSpeech/LibriSpeech/dev-clean/transcripts.tsv
    tfrecords_dir: null
    shuffle: False
    cache: True
    buffer_size: 100
    drop_remainder: True
    stage: eval

  test_dataset_config:
    use_tf: True
    data_paths:
      - /LibriSpeech/LibriSpeech/test-clean/transcripts_short.tsv
    tfrecords_dir: null
    shuffle: False
    cache: True
    buffer_size: 100
    drop_remainder: True
    stage: test

  optimizer_config:
    warmup_steps: 40000
    beta_1: 0.9
    beta_2: 0.98
    epsilon: 1e-9

  running_config:
    batch_size: 2
    num_epochs: 20
    checkpoint:
      # filepath: D:/Models/local/contextnet/checkpoints/{epoch:02d}.h5
      filepath: TensorFlowASR/examples/contextnet/checkpoints/{epoch:02d}.h5
      save_best_only: False
      save_weights_only: True
      save_freq: epoch
    states_dir: TensorFlowASR/examples/contextnet/states
    tensorboard:
      log_dir: TensorFlowASR/examples/contextnet/tensorboard
      histogram_freq: 1
      write_graph: True
      write_images: True
      update_freq: epoch
      profile_batch: 2

ROZBEH avatar Dec 01 '23 01:12 ROZBEH