yingying123321

Results 5 comments of yingying123321

I just found the information from the doc of pytorch(in the attaced picture). It shows that for a fc = nn.Linear(d_model, n_trg_vocab), actually the shape of fc's weight is (n_trg_vocab,...

通过model的past_key_value这个参数;会将past_key_value concat到每一层的key和value。