rebased icon indicating copy to clipboard operation
rebased copied to clipboard

Comparison with updated Based

Open obv-mikhail opened this issue 1 year ago • 2 comments

Based architecture seems to have been updated - https://arxiv.org/abs/2402.18668. Any insights into how it compares with Rebased?

obv-mikhail avatar Mar 10 '24 20:03 obv-mikhail

From this point, the updated arxiv version of Based is more like subsequent research on subquadratic architectures rather than a simple upgrade. This new version introduces combined linear and sliding window attention, which is orthogonal to selecting a linear attention kernel studied with our paper. Right now, we do not have evaluations of a rebased kernel combined with sliding window attention.

kefirski avatar Mar 10 '24 21:03 kefirski

Hi, I've just finished training the small 124M model, and it seems that replacing conv1d with sliding window attention is orthogonal to the Based/ReBased performance, as we achieve slightly better loss value. We will update our preprint and we have plans to release training pipeline and weights. Stay tuned!

elephantmipt avatar Mar 13 '24 15:03 elephantmipt