saurabhkoshatwar

Results 4 comments of saurabhkoshatwar

@ByronHsu @yundai424 @Tcc0403 @qingquansong As discussed in the issue, the rope implementation is different in DeepSeek. deepseek: ```python cos = cos[position_ids].unsqueeze(unsqueeze_dim) sin = sin[position_ids].unsqueeze(unsqueeze_dim) b, h, s, d = q.shape...

#take @ByronHsu I’d like to make an attempt

#take @ByronHsu @qingquansong , I’d like to make an attempt. Could you please assign it to me?