long_llama Would LongNet be easily applied to the attention with FoT

Would LongNet be easily applied to the attention with FoT

Open jebarpg opened this issue 1 year ago • 1 comments

https://arxiv.org/abs/2307.02486 Scaling to 1 billion context length paper in addition to this seems like it would solve the pursuit of infinite context length. Also FoT feels similar to L2P learn to prompt which integrates a pool of prompts to help get over the forgetful issues while applying continuous learning to a model... Maybe there could be both the database of kvs accessed via knn that blends well also with L2P... Plus the LongNet dilation algorithm could definitely benefit from contrast learning too.

Thoughts?

Jul 08 '23 00:07 jebarpg

Hi, thanks for your interest in our work! From my understanding of the LongNet paper, the main idea of FoT which is training on negative examples while utilizing longer context, and the dilated attention from LongNet seem pretty orthogonal, which would make combining these two methods an interesting research direction to explore!

Jul 08 '23 12:07 syzymon

long_llama long_llama copied to clipboard

Would LongNet be easily applied to the attention with FoT

long_llama
long_llama copied to clipboard