Amine Abdaoui

Results 1 comments of Amine Abdaoui

Thanks @liuqiangict So the query, key and value weights are shared across all the attention heads of the same layer?