刘晓群

Results 2 issues of 刘晓群

# lstm_output : [batch_size, n_step, n_hidden * num_directions(=2)], F matrix def attention_net(self, lstm_output, final_state): batch_size = len(lstm_output) hidden_forward=final_state[0] hidden_backward=final_state[1] hidden_f_b=torch.cat((hidden_forward, hidden_backward), 1) hidden = hidden_f_b.view(batch_size, -1, 1) # hidden =...

prompt is a sentence ,we don't need to predict next token in prompt, is there a question to see the right tokens? x = self.attention(x, causal_mask=True)