MoChA-pytorch icon indicating copy to clipboard operation
MoChA-pytorch copied to clipboard

PyTorch Implementation of "Monotonic Chunkwise Attention" (ICLR 2018)

Results 5 MoChA-pytorch issues
Sort by recently updated
recently updated
newest added

Excuse me,is there any trained weights or training code?

I tried this MonotonicAttention in my seq2seq model, which works well with vanilla attention, while after training for a while, it still encountered the Nan grad issue. I checked the...

I think `energy = self.tanh(self.W(encoder_outputs) + self.V(decoder_h).repeat(sequence_length, 1) + self.b)` should be writen as `energy = self.tanh(self.W(encoder_outputs) + self.V(decoder_h).repeat(1,sequence_length).reshape(batch_size*sequence_length,-1) + self.b)`

cumprod in the MoChA paper is defined to be exclusive, while the `safe_cumprod` in this repo does not. Shouldn't it be: ```python def safe_cumprod(self, x, exclusive=False): """Numerically stable cumulative product...

Is the returned attention by MonotonicAttention.soft() a probability distribution? Seems to be not, the following code: ``` from attention import MonotonicAttention monotonic = MonotonicAttention().cuda() batch_size = 1 sequence_length= 5 enc_dim,...