Results 41 comments of Tao Lei

Hi, the CUDA implementation assumes float32 inputs and outputs. Need to re-write the implementation as templates in order to support various float types.

@calclavia I've been using CUDA 9 (with FP32) and it was fine. Didn't the default pip install work ? `pip install cupy-cuda90` cupy supports CUDA 9 already: https://cupy.chainer.org/

ah yeah. `requirements.txt` doesn't include CUDA dependency. could you try to install CUDA9 and `cupy-cuda90` manually?

This seems related to the FAQ (https://docs-cupy.chainer.org/en/stable/install.html) Does cupy find the CUDA path successfully & correctly ? `libcublas.so.9.1` should lie in a directory of sth like `/usr/local/cuda/lib64`. Usually there is...

I'm using CUDA 9.0 not 9.1. The versions I have: torch.__version__ : 0.3.1 cupy.__version__ : 4.1.0 pynvrtc.__version__ : 8.0

Hi @kzjeef No. This is not an issue. For bi-SRU, the highway sub-layer is computed as: h[t] = r[t] * concatenate(h_f[t], h_b[t]) + (1-r[t]) * x[t] the final output concatenates...

The concatenation happens in the forward and backward kernels implicitly. When `bidirectional = True`, the hidden dimension becomes `d*2` instead of `d`, and the first `d` dimension represents the "left...

hi @NickShahML we use highway connections (Eq.7) instead of identity connections (residual). this is implemented in the CUDA code. comparing highway with identity (or the version w/o any skip connections)...

@NickShahML I tried a bit residual in ICML language modeling task. The training loss decreases much slower compared to using highway. so I stopped given time & resource constraints. Of...