TensorFlowASR icon indicating copy to clipboard operation
TensorFlowASR copied to clipboard

WordPiece, Sentencepiece, Refactor, Correct configurations

Open nglehuy opened this issue 3 years ago • 0 comments

Features

  • Another text_featurizer using wordpiece from tensorflow_text.FastWordpieceTokenizer
  • Update SentencepieceFeaturizer using tensorflow_text.FastSentencepieceTokenizer
  • Add tf_extract function in text_featurizer to support dataset on TPUs with use_tf: True option
  • Add jit_compile option in model's compile (for faster fixed-shape training using XLA)
  • Add gaussian weight noise in transducer decoder, wrapper function to apply and remove weight noises
  • Add convolution blur pool

Fixes

  • Refactor code (correct models, functions, unittest, configs, ...)
  • Update ASRDatasets to support text-featurizer independent tfrecords (create tfrecords only with audio and transcript, instead of audio and indices)
  • Replace some experimental options with their official supports
  • Drop support for tensorflow < 2.8 (for older versions, please use TensorFlowASR ~= v1.x)
  • Remove unused dependencies

nglehuy avatar Apr 03 '22 04:04 nglehuy