StableLM icon indicating copy to clipboard operation
StableLM copied to clipboard

More than 4096 context length?

Open StoyanStAtanasov opened this issue 2 years ago • 8 comments

Is it possible to have larger context as this allows to do more complicated things with smaller models? A lot of the negatives of a smaller model can be rectified by pushing more data into the context. For example: Help pages, datasheets, examples, thinking rules, longer conversations trying to fix an issue, etc.

Please excuse me if this is the wrong place to ask this question, but very rarely the context is discussed. Thanks in advance.

StoyanStAtanasov avatar Apr 19 '23 23:04 StoyanStAtanasov

Sure; You just need to fine-tune to a longer context :)

jon-tow avatar Apr 20 '23 00:04 jon-tow

@jon-tow You are joking right?

StoyanStAtanasov avatar Apr 20 '23 02:04 StoyanStAtanasov

Nope; see https://github.com/kyleliang919/Long-context-transformers

jon-tow avatar Apr 20 '23 05:04 jon-tow

  1. Training code has to change
  2. Data that you fine tune the model with after training has to change

So no, nothing can be done user-side to change attention span. (But maybe you can summarize blocks of text with the model so you can then feed the already summarized text as a whole to do your thing)

NPap0 avatar Apr 20 '23 06:04 NPap0

Pretty sure the answer is no due to how positional encoding is done.

mallorbc avatar Apr 20 '23 19:04 mallorbc

@mallorbc The model uses RoPE - a relative attention mechanism. See https://blog.eleuther.ai/rotary-embeddings/

jon-tow avatar Apr 20 '23 20:04 jon-tow

@jon-tow So models like GPTJ can be finetuned and generate more than their sequence length? Whenever I try to generate sequences for GPTJ I have issues. Maybe that is something else unrelated.

mallorbc avatar Apr 20 '23 20:04 mallorbc