MoChA-pytorch issues

trained weights or training code?

Excuse me，is there any trained weights or training code?

safe_cumprod still causes Nan grad

2

I tried this MonotonicAttention in my seq2seq model, which works well with vanilla attention, while after training for a while, it still encountered the Nan grad issue. I checked the...

LiNaihan

Something Wrong in Energy

I think `energy = self.tanh(self.W(encoder_outputs) + self.V(decoder_h).repeat(sequence_length, 1) + self.b)` should be writen as `energy = self.tanh(self.W(encoder_outputs) + self.V(decoder_h).repeat(1,sequence_length).reshape(batch_size*sequence_length,-1) + self.b)`

hunterbobo

implementation of `safe_cumprod`

1

cumprod in the MoChA paper is defined to be exclusive, while the `safe_cumprod` in this repo does not. Shouldn't it be: ```python def safe_cumprod(self, x, exclusive=False): """Numerically stable cumulative product...

bo-son

Questions about MonotonicAttention.soft

1

Is the returned attention by MonotonicAttention.soft() a probability distribution? Seems to be not, the following code: ``` from attention import MonotonicAttention monotonic = MonotonicAttention().cuda() batch_size = 1 sequence_length= 5 enc_dim,...

tugstugi

MoChA-pytorch
MoChA-pytorch copied to clipboard

Metadata

trained weights or training code?

safe_cumprod still causes Nan grad

Something Wrong in Energy

implementation of `safe_cumprod`

Questions about MonotonicAttention.soft

← Metadata

Owner

Metadata

MoChA-pytorch MoChA-pytorch copied to clipboard

Metadata

trained weights or training code?

safe_cumprod still causes Nan grad

Something Wrong in Energy

implementation of `safe_cumprod`

Questions about MonotonicAttention.soft

← Metadata

Owner

Metadata

MoChA-pytorch
MoChA-pytorch copied to clipboard