audio RNN Transducer Loss

RNN Transducer Loss

Open vincentqb opened this issue 4 years ago • 1 comments

This issue is to track the follow-up work to #1137, which introduced rnnt_loss and RNNTLoss as a prototype in torchaudio.prototype.transducer using HawkAaron's warp-transducer.

Update documentation
- [ ] Guard documentation (e.g. with conditional doc build)
- [x] Write e.g. torchtext (#1171)
Extend guards for prototype
- [ ] Guard prototype python files by omitting them from torchaudio, see also https://github.com/pytorch/audio/pull/1137#discussion_r551496192
- [x] Guard building third party transducer even if not added as an extension (#1159)
- [x] Enable building transducer in nightlies only, disable for release.
Update building process
- [x] Pass along the DEBUG flag to cmake
- [x] Remove hardcoded O2/O3 optimization, see https://github.com/pytorch/audio/pull/1137#discussion_r551498022 (#1159)
- [x] Build within same folders as libsox, https://github.com/pytorch/audio/pull/1137#discussion_r551499829 and https://github.com/pytorch/audio/pull/1137#discussion_r551556305 (#1159)
- [x] Move libsox to a third_party subfolder as suggested in https://github.com/pytorch/audio/pull/1137#discussion_r550321378 (#1161).
- [x] Add GPU implementation and compilation. (see https://github.com/pytorch/audio/pull/1483)
- [x] Add USE_CUDA option for user: build currently depends on presence of device, see here, and pytorch.
- [ ] Add CUDA build binaries, https://github.com/pytorch/audio/pull/1497
Modernization
- [x] Migrate the checks to C++.
- [x] Add autograd test https://github.com/pytorch/audio/pull/1532
- [x] Add Torchscriptability test (attempt, internal).
- [ ] Investigate using AT_DISPATCH_FLOATING_TYPES.
- [x] Update bindings to remove pytorch deprecation warnings. (#1160)
- [x] Refactor and update the API, see warprnnt and internal.
- [x] Add support for float16.
- [ ] rnnt loss should not capture the gradient here. (Should rnnt loss custom C++ autograd function return the gradient?)
- [x] Remove numpy test utilities from tests
- [ ] Replace change of parameter to assertion here

cc @astaff, internal

Feb 04 '21 22:02 vincentqb

Is there a plan to support the packed layout logits of RNNT loss?

Ref: Sec 3.1 https://arxiv.org/abs/1909.12415

Jun 17 '22 07:06 maxwellzh

audio audio copied to clipboard

RNN Transducer Loss

audio
audio copied to clipboard