jong-won-lee
Results
1
comments of
jong-won-lee
> Thanks @liuqiangict So the query, key and value weights are shared across all the attention heads of the same layer? They are different. If they are shared, weight size...