Distinguishing the index i, unifying the dropout layer + matrix multiplication incompatibilities

Open Strawl opened this issue 1 year ago • 0 comments

The current problems fixed by this commit:

Blocks have a linear layer at the end: self.proj = nn.Linear(hidden_size, input_size) Thus leads to an incompatibility, if you use multiple blocks: xLSTMBlock(embedding_size if i == 0 else hidden_size, Subsequent blocks will always have the input size. This is fixed by xLSTMBlock(embedding_size, hidden_size, num_layers, dropout, bidirectional, lstm_type)

Then another problem was the dropout layer: for i, (lstm, dropout, f_gate, i_gate) in enumerate(zip(self.lstms, self.dropout_layers, self.exp_forget_gates, self.exp_input_gates)): here, the self.dropout_layers is not the same size as the other objects in the zip. This is then fixed by only having one dropout layer which is applied conditionally.

The third problem is the already known index problem, that others have pointed out.

May 20 '24 08:05 Strawl