Results 41 comments of Tao Lei

Of course, when `c_t` is never used in subsequent computation, pytorch would provide a `grad_last` that's all zeros.

hi @cbasavaraj , this sounds very interesting. i will put it in the TODO, but i wouldn't be available to try this recently. Got quite many other tasks to do.

Hi @anis016 I don't know if there are significant changes to Hitvoice/DrQA after I forked the repo and add SRU support. There are a couple of changes I made so...

Hi, - The major change to the mater is commit https://github.com/taolei87/sru/commit/bcc6cde62cdb19f0f4d23a2ca548d8e63fe683c5 , in which a scaling constant term is introduced in the highway transformation: `h[t] = r[t] * c[t] +...

Hi, thank you for the PR! - I didn't test the new version on language modeling task. I'm now focusing on translation tasks, and can check on this later. The...

Sounds great! Let me know if you have more questions / issues. :)

hi @hangcao1004 Wouldn't directly adding `printf` work? See http://15418.courses.cs.cmu.edu/spring2013/article/15 https://stackoverflow.com/questions/28257111/how-does-printf-work-on-cuda-compute-2/28361950

hi @hoagy-davis-digges , did you mean you tried the following example and got NaN? ``` import torch from sru import SRU, SRUCell # input has length 20, batch size 32...

One solution is ignore ninja, and the new version does this. Does install the new version 2.1.3 solve your problem?

hi @liziru , thanks for posting the great questions. (1) Difference of the two versions. In our first arxiv version, we didn't include the element-wise hidden-to-hidden connection (`v * c_{t-1}`)...