byteNet-tensorflow
byteNet-tensorflow copied to clipboard
the padding for dilated convolution seems to be wrong
For encoder, the dilated convolution for encoder pads (filter_width - 1) * dilation/2
, this means after reshaping there are (filter_width-1)
zeros at the beginning. But conv1d
uses SAME
padding, which again, will pad (filter_width-1)
number of zeros, which duplicates the zeros needed.
Assume filter_width=3, dilation=2
, the input is
1 2 3 4 5
Ater padding in dilated convolution function, the input becomes
0 0 1 2 3 4 5 0 0
After the reshape,
0 1 3 5 0
0 2 4 6 0
becomes the input to conv1d
, which will again, pad with filter_width-1
zeros with the SAME
padding scheme
0 0 1 3 4 0 0
0 0 2 4 6 0 0
For filter_width=3, dilation=1 => dilation/2 = 0 => (filter_width - 1) * dilation/2 = 0 . So the input will not be padded.
sorry its a typo, dilation = 2
I just tested
For decoder
, if I change the original padding to
padding = [[0, 0], [(filter_width - 1) * dilation/2, (filter_width -1) * dilation/2], [0, 0]]
the result looks normal
But if I change it to
padding = [[0, 0], [0, 0], [0, 0]]
, the loss is close to 0.
So for the encoder
:
padding = [[0, 0], [0, 0], [0, 0]]
and decoder
:
padding = [[0, 0], [(filter_width - 1) * dilation/2, (filter_width -1) * dilation/2], [0, 0]]
Another way to correct it is to keep the padding but change the pad in conv1d
from SAME
to VALID
.
The padding setting for the decoder is correct.
padding = [[0, 0], [(filter_width - 1) * dilation, 0], [0, 0]]
This is done to preserve the causality. Refer to the bytenet decoder diagram for further clarity. Check causal_conv
function in :
https://github.com/ibab/tensorflow-wavenet/blob/master/wavenet/ops.py
For the encoder, The current code is fine as well. If you are using an odd filter width (1,3,5..) it shouldn't be a problem. I think having a look at the diagram again might help. The first output of the encoder depends only on the first and the third element, which will be the case if :
0 1 3 5 0
0 2 4 6 0
is the padded input for conv1d.
Let me know if you still think I may be wrong.
You are ignoring the padding from '''conv1d''' itself, the input is ''" 0 1 3 5 0 0 2 4 6 0 ''' becomes the input to conv1d, which will again, pad with filter_width-1 zeros with the SAME padding scheme
'''
0 0 1 3 4 0 0
0 0 2 4 6 0 0
''"
Oh, I see. I just went through the documentation of tf.nn.conv1d. It seems that the input may be paded differently for different input lengths, which is not desirable. Changing it to VALID
seems appropriate. I would appreciate if you can make the required change and test machine translation to check if it improves.
Do you think this change might also be required for the decoder
? That implementsation is the same as wavenet though.
@zcyang Updated the ops. The model works correctly now. It was indeed an error in encoder padding. Check Model/ops
for the new implementation.