Distinguishing the index i, unifying the dropout layer + matrix multiplication incompatibilities
The current problems fixed by this commit:
Blocks have a linear layer at the end:
self.proj = nn.Linear(hidden_size, input_size)
Thus leads to an incompatibility, if you use multiple blocks:
xLSTMBlock(embedding_size if i == 0 else hidden_size,
Subsequent blocks will always have the input size. This is fixed by
xLSTMBlock(embedding_size, hidden_size, num_layers, dropout, bidirectional, lstm_type)
Then another problem was the dropout layer:
for i, (lstm, dropout, f_gate, i_gate) in enumerate(zip(self.lstms, self.dropout_layers, self.exp_forget_gates, self.exp_input_gates)):
here, the self.dropout_layers is not the same size as the other objects in the zip. This is then fixed by only having one dropout layer which is applied conditionally.
The third problem is the already known index problem, that others have pointed out.