Vladimir Bataev
Vladimir Bataev
@galv as I see, the issue can be fixed when passing appropriate device to cuda streams initializers and getters: ``` def with_conditional_node(while_loop_kernel, while_loop_args, while_loop_conditional_handle, device): ... body_stream = torch.cuda.Stream(device=device) previous_stream...
@galvI tried some changes, and it seems I can get it to work. But I'm wondering why these changes are required and why everything works when creating a graph for...
@galv I manually restarted Jenkins, but it is still waiting for an executor
@galv please fix the test failing on Jenkins (the guard is needed) > FAILED tests/collections/asr/decoding/test_cuda_graph_rnnt_greedy_decoding.py::test_change_devices - ImportError: Found cuda-python 12.3.0rc4+8.gcb4e395, but at least version 12.3.0 is needed.
I'm sorry, Gram-CTC is not yet implemented, but it is first priority future task: [https://github.com/artbataev/end2end#future-plans](https://github.com/artbataev/end2end#future-plans), and I'm working on it. For now only CTC-Loss and CTC Beam Search Decoder with...
@GNroy, can you please provide the instructions for testing this PR? A model + test set + LM/text to build LM with the sequence of required operations to build graphs/start...
Please, also fix CodeQL suggestions - most of them seems to be valuable (e.g., `'except' clause does nothing but pass and there is no explanatory comment.`, `This 'lambda' is just...
Implemented in #13917
@dorispei, @utunga Please check https://github.com/NVIDIA-NeMo/NeMo/pull/15173 That PR adds a fallback option to use native PyTorch CUDA graphs if full graph compilation failed. Should work a bit slower than default full...