sru
sru copied to clipboard
Bi-Direction forward and backward seems incorrect, only capture half of input_x in each direction in element-wise
Hi Tao,
I recently found there is issue in bi-direction case, such as input_size = 6, hidden_size = 3, direction_count = 2, length = 2, batch_size = 2 in this case, k == 3,
the x matrix will be like this:
l(the f
or the r
latter before x's number is mark for forward and reverse (flip == 1))
[[[f-0.302948 f-0.255578 f-0.110915 r0.1591 r0.928114 r0.92241 ]
[f-0.50604 f0.391675 f-0.187608 r0.468802 r-0.648262 r-0.177739]]
[[ f0.50936 f0.67189 f-0.619738 r0.377355 r0.545083 r-0.971449]
[ f0.948531 f-0.551092 f0.227567 r-0.46116 r-0.496896 r-0.769874]]]
In the forward kernel: I did print in forward kernel:
[F] col:0 L:0 N:0 D:0 DIR:0 act:1 k:3 d:3 x: -0.302948
[F] col:1 L:0 N:0 D:1 DIR:0 act:1 k:3 d:3 x: -0.255578
[F] col:2 L:0 N:0 D:2 DIR:0 act:1 k:3 d:3 x: -0.110915
[F] col:3 L:0 N:0 D:0 DIR:1 act:1 k:3 d:3 x: 0.377355
[F] col:4 L:0 N:0 D:1 DIR:1 act:1 k:3 d:3 x: 0.545083
[F] col:5 L:0 N:0 D:2 DIR:1 act:1 k:3 d:3 x: -0.971449
[F] col:6 L:0 N:1 D:0 DIR:0 act:1 k:3 d:3 x: -0.506040
[F] col:7 L:0 N:1 D:1 DIR:0 act:1 k:3 d:3 x: 0.391675
[F] col:8 L:0 N:1 D:2 DIR:0 act:1 k:3 d:3 x: -0.187608
[F] col:9 L:0 N:1 D:0 DIR:1 act:1 k:3 d:3 x: -0.461160
[F] col:10 L:0 N:1 D:1 DIR:1 act:1 k:3 d:3 x: -0.496896
[F] col:11 L:0 N:1 D:2 DIR:1 act:1 k:3 d:3 x: -0.769874
[F] col:0 L:1 N:0 D:0 DIR:0 act:1 k:3 d:3 x: 0.509360
[F] col:1 L:1 N:0 D:1 DIR:0 act:1 k:3 d:3 x: 0.671890
[F] col:2 L:1 N:0 D:2 DIR:0 act:1 k:3 d:3 x: -0.619738
[F] col:3 L:1 N:0 D:0 DIR:1 act:1 k:3 d:3 x: 0.159100
[F] col:4 L:1 N:0 D:1 DIR:1 act:1 k:3 d:3 x: 0.928114
[F] col:5 L:1 N:0 D:2 DIR:1 act:1 k:3 d:3 x: 0.922410
[F] col:6 L:1 N:1 D:0 DIR:0 act:1 k:3 d:3 x: 0.948531
[F] col:7 L:1 N:1 D:1 DIR:0 act:1 k:3 d:3 x: -0.551092
[F] col:8 L:1 N:1 D:2 DIR:0 act:1 k:3 d:3 x: 0.227567
[F] col:9 L:1 N:1 D:0 DIR:1 act:1 k:3 d:3 x: 0.468802
[F] col:10 L:1 N:1 D:1 DIR:1 act:1 k:3 d:3 x: -0.648262
[F] col:11 L:1 N:1 D:2 DIR:1 act:1 k:3 d:3 x: -0.177739
F
For you reference, this is code add to print value.
void sru_bi_fwd(...) {
for (int row = 0; row < len; ++row )
{
...
...
*hp = (val*mask-(*xp))*g2 + (*xp);
printf("[F] col:%d L:%d N:%d D:%d DIR:%d act:%d k:%d d:%d x:%f\n",
col, cnt, (col/d2), (col%d), flip, activation_type, k, d, *(xp) );
And I found for the forward direction( flip == 0 (print as DIR)) code, only access the left half of input x, and backward direction(flip == 1) only access the right half of input x.
This behavior is very different from every other case in SRU (k == 3, uni-direction, k == 4, uni/bi-direction), Since it only can keep half information of input by the reset gate, but the activation will see all the x's input.
Do you think this is a issue ?
Hi @kzjeef
No. This is not an issue. For bi-SRU, the highway sub-layer is computed as:
h[t] = r[t] * concatenate(h_f[t], h_b[t]) + (1-r[t]) * x[t]
the final output concatenates the states of both directions, and adds the input x[t] as well. The states of both directions and the input x[t] will be used in subsequent layers.
Hi @taolei87,
Thanks for you reply.
I don't very clear what's you mean highway sub-layer compute, I don't see the concate(h_f[t], h_b][t]) in cuda_functional.py code in sru_forward() and sru_backward() and other python code, would you mind point out where the code is, or maybe in the example 's code ?
The concatenation happens in the forward and backward kernels implicitly.
When bidirectional = True
, the hidden dimension becomes d*2
instead of d
, and the first d
dimension represents the "left to right" hidden states and the second half represents the "right to left" direction.
The for loop in the kernel function will operate in the reverse direction for the second half. See here for example: https://github.com/taolei87/sru/blob/master/cuda_functional.py#L223-L228