Jan "yenda" Trmal
Jan "yenda" Trmal
Does it look like some timing/kernel sync issue?
``` export K2_DISABLE_CHECKS=0 export K2_SYNC_KERNELS=1 export CUDA_LAUNCH_BLOCKING=1 ``` didn't change the behavior, tho
it did succeed on CPU, I think: ``` $ CUDA_VISIBLE_DEVICES= ctest --rerun-failed --output-on-failure Test project /home/jtrmal/projects/k2/build_debug Start 97: rnnt_loss_test_py 1/1 Test #97: rnnt_loss_test_py ................ Passed 1.52 sec 100% tests passed,...
sorry for spamming :/
``` ====================================================================== FAIL: test_rnnt_loss_empty_reference (__main__.TestRnntLoss) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jtrmal/projects/k2/k2/python/tests/rnnt_loss_test.py", line 850, in test_rnnt_loss_empty_reference torch.testing.assert_close( File "/home/jtrmal/.local/lib/python3.9/site-packages/torch/testing/_comparison.py", line 1342, in assert_close assert_equal( File "/home/jtrmal/.local/lib/python3.9/site-packages/torch/testing/_comparison.py", line 1093, in...
tested out cudnn 8.3, 8.6,8.8 and could reproduce on all three
it's 3090 will get in touch with Desh, if he can dig deeper than I can y. On Fri, Feb 17, 2023 at 3:15 AM Daniel Povey ***@***.***> wrote: >...
[cpu.log](https://github.com/k2-fsa/k2/files/10794949/cpu.log) [gpu.log](https://github.com/k2-fsa/k2/files/10794950/gpu.log) I'm attaching logs from both cpu and gpu runs, obtained as ``` CUDA_VISIBLE_DEVICES= ctest --rerun-failed --verbose > cpu.log CUDA_VISIBLE_DEVICES=0 ctest --rerun-failed --verbose > gpu.log ``` CPU run succeeded,...
I think even several "normal" syslogs can direct the outputs to a remote machine y. On Thu, Dec 10, 2020 at 11:31 PM kkm000 wrote: > Carryover from #17 >...