Tao Lei comments

Results 41 comments of


Tao Lei

ImportError:You are running using the stub version of nvrtc

Check if you have the recent versions of `cupy` `pynvrtc` and `cuda` installed.

About Grad: gradient check failed in some case, how to correct calculate x's gradient ?

Hi Since U is computed by multiplying `X` with `weight` using `torch.mm()` https://github.com/taolei87/sru/blob/master/cuda_functional.py#L677 Pytorch will handle the backward gradient computation automatically (i.e. grad_u -> grad_w). I believe TF supports the...

About Grad: gradient check failed in some case, how to correct calculate x's gradient ?

Hi @kzjeef As you see in my code and musyoku's code, there are two possible ways to implement it. (1) you define your operator to take `U` as input instead...

Error when use SRU in DrQA

@wangwei7175878 Hi, could you check if you are using CUDA 8, and pytorch is using the GPU, i.e. `torch.cuda.is_available()`?

What if turning off the highway connection?

@byzhang hi, I haven't done very thorough experiments on this. I noticed highway helps quite a bit in language modeling task in our earlier ICML work. So I stick with...

Mixed Precision Training

@Zadagu SRU / SRU++ can work with pytorch native mixed precision training. See this for example: https://github.com/asappresearch/sru/blob/3.0.0-dev/experiments/srupp_experiments/train_enwik8.py#L250

Is it possible to add cross attention aka encoder outputs to SRU++ ?

@hadaev8 yes and no. Yes in the sense that within each SRU++ layer, the layer will attend to both self outputs and the memory inputs. No in the sense that...

speed

Hi, sorry for the late reply. Did you check the GPU usage (e.g. `nvidia-smi`) while running the code? I suspect the usage is small and the bottleneck is IO instead...

AttributeError: 'bytes' object has no attribute 'encode'

Hi, The new version of `pynvrtc` assumes bytes objects instead of str objects as input to Program(). You can fix the error by removing `encode('utf-8')` and `encode()` call in: `_SRU_PROG...

Gradient calculation error?

Hi @Sunnydreamrain No, `grad_last` is not necessarily always zero. In some cases the model will pass the last cell state `c_t` into subsequent model components. For example, in sequence-to-sequence task...