TensorFlowASR WordPiece, Sentencepiece, Refactor, Correct configurations

WordPiece, Sentencepiece, Refactor, Correct configurations

Open nglehuy opened this issue 3 years ago • 0 comments

Another text_featurizer using wordpiece from tensorflow_text.FastWordpieceTokenizer
Update SentencepieceFeaturizer using tensorflow_text.FastSentencepieceTokenizer
Add tf_extract function in text_featurizer to support dataset on TPUs with use_tf: True option
Add jit_compile option in model's compile (for faster fixed-shape training using XLA)
Add gaussian weight noise in transducer decoder, wrapper function to apply and remove weight noises
Add convolution blur pool

Refactor code (correct models, functions, unittest, configs, ...)
Update ASRDatasets to support text-featurizer independent tfrecords (create tfrecords only with audio and transcript, instead of audio and indices)
Replace some experimental options with their official supports
Drop support for tensorflow < 2.8 (for older versions, please use TensorFlowASR ~= v1.x)
Remove unused dependencies

Apr 03 '22 04:04 nglehuy