jong-won-lee

Results 1 comments of jong-won-lee

> Thanks @liuqiangict So the query, key and value weights are shared across all the attention heads of the same layer? They are different. If they are shared, weight size...