Tao Lei comments

Results 41 comments of


Tao Lei

Gradient calculation error?

Of course, when `c_t` is never used in subsequent computation, pytorch would provide a `grad_last` that's all zeros.

Recurrent BatchNorm for SRU?

hi @cbasavaraj , this sounds very interesting. i will put it in the TODO, but i wouldn't be available to try this recently. Got quite many other tasks to do.

DrQA tasks doesn't perform good

Hi @anis016 I don't know if there are significant changes to Hitvoice/DrQA after I forked the repo and add SRU support. There are a couple of changes I made so...

Latest master SRU fails to train

Hi, - The major change to the mater is commit https://github.com/taolei87/sru/commit/bcc6cde62cdb19f0f4d23a2ca548d8e63fe683c5 , in which a scaling constant term is introduced in the highway transformation: `h[t] = r[t] * c[t] +...

Latest master SRU fails to train

Hi, thank you for the PR! - I didn't test the new version on language modeling task. I'm now focusing on translation tasks, and can check on this later. The...

Latest master SRU fails to train

Sounds great! Let me know if you have more questions / issues. :)

About sru_cuda_kernel.cu

hi @hangcao1004 Wouldn't directly adding `printf` work? See http://15418.courses.cs.cmu.edu/spring2013/article/15 https://stackoverflow.com/questions/28257111/how-does-printf-work-on-cuda-compute-2/28361950

Nan in output from example code

hi @hoagy-davis-digges , did you mean you tried the following example and got NaN? ``` import torch from sru import SRU, SRUCell # input has length 20, batch size 32...

[Errno 2] No such file or directory: 'ninja': 'ninja''

One solution is ignore ninja, and the new version does this. Does install the new version 2.1.3 solve your problem?

Confusion about computation in paper 'Simple Recurrent Units for Highly Parallelizable Recurrence' ?

hi @liziru , thanks for posting the great questions. (1) Difference of the two versions. In our first arxiv version, we didn't include the element-wise hidden-to-hidden connection (`v * c_{t-1}`)...