Ivan Sorokin
Ivan Sorokin
Hello [Huahuan Zheng](https://github.com/maxwellzh), interesting theory! But I don't think it will be useful in practice. Optimising a forward pass doesn't make sense. Your can check the cuda profiler logs. The...
I’m not familiar with memory manager for cuda threads. But you right, having TxU matrix is the main bottleneck. Fortunately, there is solution for this, [fast_rnnt](https://github.com/danpovey/fast_rnnt). It looks really promising.
Hey, thank you for your feedbacks. Could you try a new version `pip install warp-rnnt==0.6.0`? Should be solved now.
Please share your environment details: OS, Python version, PyTorch version, CUDA/cuDNN version, GCC version. Could you try another version of gcc?
Do you have installed [cudatoolkit](https://anaconda.org/anaconda/cudatoolkit)? Make sure it has the same version of cuda.
I never tried to install it on Windows. Is it happened after upgrade from warp-rnnt 0.5.0 -> 0.6.0?
I was not looking specifically for tensorflow implementation. If you find, please share.
Hey, Albert! I'm curious why you chose this implementation? Because it is a rare implementation of RNA or because of CUDA warps threads? In any case, I'm glad to see...