yingying123321
Results
5
comments of
yingying123321
I just found the information from the doc of pytorch(in the attaced picture). It shows that for a fc = nn.Linear(d_model, n_trg_vocab), actually the shape of fc's weight is (n_trg_vocab,...
the same issue here
通过model的past_key_value这个参数;会将past_key_value concat到每一层的key和value。