He Huang (Steve)
He Huang (Steve)
> Hey so with this pr the I don't need to define max_step param for any scheduler ? Not really, this PR aims to fix the bug that current code...
The problem was already fixed in [this PR](https://github.com/NVIDIA/NeMo/pull/4470), closing this PR
> @stevehuang52, sorry for the delay in reviewing the PR. There's currently a lot of design questions around transformers that we are discussing. > > Have you tried using the...
> @stevehuang52 we'd like to "deprecate" non-Megatron transformers in NeMo. Can you please have a look at whether you can use those? @okuchaiev Do Megratron transformers have sequence generator similar...
> Megatron transformers requires apex, I'd like to avoid that as much as possible for ASR. @stevehuang52 please try to see if the ordinary transformer blocks will work for your...
@titu1994 @nithinraok could you please take another look to see if your comments have been addressed? Thanks~
@titu1994 @zhehuaichen I've refactored the dataset such that the input and output keys can be configured dynamically by setting `context_key` and `answer_key` in the dataset. For example, if we want...
@zhehuaichen FYI I removed the `random context training` trick from the dataset, since it only makes sense for word-boosting and not other tasks. It's better to actually generate those word-boosting...
> > @zhehuaichen FYI I removed the `random context training` trick from the dataset, since it only makes sense for word-boosting and not other tasks. It's better to actually generate...
jenkins