audio
audio copied to clipboard
RNN Transducer Loss
This issue is to track the follow-up work to #1137, which introduced rnnt_loss and RNNTLoss as a prototype in torchaudio.prototype.transducer using HawkAaron's warp-transducer.
- Update documentation
- [ ] Guard documentation (e.g. with conditional doc build)
- [x] Write e.g. torchtext (#1171)
- Extend guards for prototype
- [ ] Guard prototype python files by omitting them from torchaudio, see also https://github.com/pytorch/audio/pull/1137#discussion_r551496192
- [x] Guard building third party transducer even if not added as an extension (#1159)
- [x] Enable building transducer in nightlies only, disable for release.
- Update building process
- [x] Pass along the
DEBUGflag to cmake - [x] Remove hardcoded O2/O3 optimization, see https://github.com/pytorch/audio/pull/1137#discussion_r551498022 (#1159)
- [x] Build within same folders as libsox, https://github.com/pytorch/audio/pull/1137#discussion_r551499829 and https://github.com/pytorch/audio/pull/1137#discussion_r551556305 (#1159)
- [x] Move libsox to a third_party subfolder as suggested in https://github.com/pytorch/audio/pull/1137#discussion_r550321378 (#1161).
- [x] Add GPU implementation and compilation. (see https://github.com/pytorch/audio/pull/1483)
- [x] Add
USE_CUDAoption for user: build currently depends on presence of device, see here, and pytorch. - [ ] Add CUDA build binaries, https://github.com/pytorch/audio/pull/1497
- [x] Pass along the
- Modernization
- [x] Migrate the checks to C++.
- [x] Add autograd test https://github.com/pytorch/audio/pull/1532
- [x] Add Torchscriptability test (attempt, internal).
- [ ] Investigate using
AT_DISPATCH_FLOATING_TYPES. - [x] Update bindings to remove pytorch deprecation warnings. (#1160)
- [x] Refactor and update the API, see warprnnt and internal.
- [x] Add support for
float16. - [ ] rnnt loss should not capture the gradient here. (Should rnnt loss custom C++ autograd function return the gradient?)
- [x] Remove numpy test utilities from tests
- [ ] Replace change of parameter to assertion here
cc @astaff, internal
Is there a plan to support the packed layout logits of RNNT loss?
Ref: Sec 3.1 https://arxiv.org/abs/1909.12415