sru icon indicating copy to clipboard operation
sru copied to clipboard

Bi-Direction forward and backward seems incorrect, only capture half of input_x in each direction in element-wise

Open kzjeef opened this issue 6 years ago • 3 comments

Hi Tao,

I recently found there is issue in bi-direction case, such as input_size = 6, hidden_size = 3, direction_count = 2, length = 2, batch_size = 2 in this case, k == 3,

the x matrix will be like this: l(the f or the r latter before x's number is mark for forward and reverse (flip == 1))

[[[f-0.302948 f-0.255578 f-0.110915  r0.1591    r0.928114  r0.92241 ]

  [f-0.50604   f0.391675 f-0.187608  r0.468802 r-0.648262 r-0.177739]]

 [[ f0.50936   f0.67189  f-0.619738  r0.377355  r0.545083 r-0.971449]
  [ f0.948531 f-0.551092  f0.227567 r-0.46116  r-0.496896 r-0.769874]]]

In the forward kernel: I did print in forward kernel:

[F] col:0  L:0 N:0 D:0 DIR:0 act:1 k:3 d:3 x: -0.302948 
[F] col:1  L:0 N:0 D:1 DIR:0 act:1 k:3 d:3 x: -0.255578 
[F] col:2  L:0 N:0 D:2 DIR:0 act:1 k:3 d:3 x: -0.110915 
[F] col:3  L:0 N:0 D:0 DIR:1 act:1 k:3 d:3 x: 0.377355 
[F] col:4  L:0 N:0 D:1 DIR:1 act:1 k:3 d:3 x: 0.545083 
[F] col:5  L:0 N:0 D:2 DIR:1 act:1 k:3 d:3 x: -0.971449 
[F] col:6  L:0 N:1 D:0 DIR:0 act:1 k:3 d:3 x: -0.506040 
[F] col:7  L:0 N:1 D:1 DIR:0 act:1 k:3 d:3 x: 0.391675 
[F] col:8  L:0 N:1 D:2 DIR:0 act:1 k:3 d:3 x: -0.187608 
[F] col:9  L:0 N:1 D:0 DIR:1 act:1 k:3 d:3 x: -0.461160 
[F] col:10  L:0 N:1 D:1 DIR:1 act:1 k:3 d:3 x: -0.496896
[F] col:11  L:0 N:1 D:2 DIR:1 act:1 k:3 d:3 x: -0.769874
[F] col:0  L:1 N:0 D:0 DIR:0 act:1 k:3 d:3 x: 0.509360 
[F] col:1  L:1 N:0 D:1 DIR:0 act:1 k:3 d:3 x: 0.671890 
[F] col:2  L:1 N:0 D:2 DIR:0 act:1 k:3 d:3 x: -0.619738 
[F] col:3  L:1 N:0 D:0 DIR:1 act:1 k:3 d:3 x: 0.159100 
[F] col:4  L:1 N:0 D:1 DIR:1 act:1 k:3 d:3 x: 0.928114 
[F] col:5  L:1 N:0 D:2 DIR:1 act:1 k:3 d:3 x: 0.922410 
[F] col:6  L:1 N:1 D:0 DIR:0 act:1 k:3 d:3 x: 0.948531 
[F] col:7  L:1 N:1 D:1 DIR:0 act:1 k:3 d:3 x: -0.551092 
[F] col:8  L:1 N:1 D:2 DIR:0 act:1 k:3 d:3 x: 0.227567 
[F] col:9  L:1 N:1 D:0 DIR:1 act:1 k:3 d:3 x: 0.468802 
[F] col:10  L:1 N:1 D:1 DIR:1 act:1 k:3 d:3 x: -0.648262 
[F] col:11  L:1 N:1 D:2 DIR:1 act:1 k:3 d:3 x: -0.177739 
F

For you reference, this is code add to print value.

void sru_bi_fwd(...) {
for (int row = 0; row < len; ++row )
{
...
...
      *hp = (val*mask-(*xp))*g2 + (*xp);
      printf("[F] col:%d  L:%d N:%d D:%d DIR:%d act:%d k:%d d:%d x:%f\n",
             col, cnt, (col/d2), (col%d), flip, activation_type, k, d, *(xp) );

And I found for the forward direction( flip == 0 (print as DIR)) code, only access the left half of input x, and backward direction(flip == 1) only access the right half of input x.

This behavior is very different from every other case in SRU (k == 3, uni-direction, k == 4, uni/bi-direction), Since it only can keep half information of input by the reset gate, but the activation will see all the x's input.

Do you think this is a issue ?

kzjeef avatar May 31 '18 11:05 kzjeef

Hi @kzjeef

No. This is not an issue. For bi-SRU, the highway sub-layer is computed as:

h[t] = r[t] * concatenate(h_f[t], h_b[t]) + (1-r[t]) * x[t]

the final output concatenates the states of both directions, and adds the input x[t] as well. The states of both directions and the input x[t] will be used in subsequent layers.

taolei87 avatar Jun 02 '18 20:06 taolei87

Hi @taolei87,

Thanks for you reply.

I don't very clear what's you mean highway sub-layer compute, I don't see the concate(h_f[t], h_b][t]) in cuda_functional.py code in sru_forward() and sru_backward() and other python code, would you mind point out where the code is, or maybe in the example 's code ?

kzjeef avatar Jun 04 '18 03:06 kzjeef

The concatenation happens in the forward and backward kernels implicitly. When bidirectional = True, the hidden dimension becomes d*2 instead of d, and the first d dimension represents the "left to right" hidden states and the second half represents the "right to left" direction.

The for loop in the kernel function will operate in the reverse direction for the second half. See here for example: https://github.com/taolei87/sru/blob/master/cuda_functional.py#L223-L228

taolei87 avatar Jun 07 '18 14:06 taolei87