Results 41 comments of Tao Lei

Check if you have the recent versions of `cupy` `pynvrtc` and `cuda` installed.

Hi Since U is computed by multiplying `X` with `weight` using `torch.mm()` https://github.com/taolei87/sru/blob/master/cuda_functional.py#L677 Pytorch will handle the backward gradient computation automatically (i.e. grad_u -> grad_w). I believe TF supports the...

Hi @kzjeef As you see in my code and musyoku's code, there are two possible ways to implement it. (1) you define your operator to take `U` as input instead...

@wangwei7175878 Hi, could you check if you are using CUDA 8, and pytorch is using the GPU, i.e. `torch.cuda.is_available()`?

@byzhang hi, I haven't done very thorough experiments on this. I noticed highway helps quite a bit in language modeling task in our earlier ICML work. So I stick with...

@Zadagu SRU / SRU++ can work with pytorch native mixed precision training. See this for example: https://github.com/asappresearch/sru/blob/3.0.0-dev/experiments/srupp_experiments/train_enwik8.py#L250

@hadaev8 yes and no. Yes in the sense that within each SRU++ layer, the layer will attend to both self outputs and the memory inputs. No in the sense that...

Hi, sorry for the late reply. Did you check the GPU usage (e.g. `nvidia-smi`) while running the code? I suspect the usage is small and the bottleneck is IO instead...

Hi, The new version of `pynvrtc` assumes bytes objects instead of str objects as input to Program(). You can fix the error by removing `encode('utf-8')` and `encode()` call in: `_SRU_PROG...

Hi @Sunnydreamrain No, `grad_last` is not necessarily always zero. In some cases the model will pass the last cell state `c_t` into subsequent model components. For example, in sequence-to-sequence task...