long_llama issues

How would you go about instruction finetuning?

13

How would you finetune in this style with an instruction finetuning data set like Open-Orca?

FoT attention and the scaling trick

3

In your paper, you say > Position Interpolation (PI, [Chen et al., 2023] and [kaiokendev, 2023]) introduces a modification to the rotary positional encoding scheme that enables fine-tuning for 32K...

StrangeTcy

Finetuning code?

7

That sounds massively interesting, and while we try to run inference and read the paper, should we expect the release of the finetuning code?

StrangeTcy

Does each token requires KNN search during inference?

3

If i use faiss as a Memory, during the inference，calculating each token requires 3(becase there are 3 memory attention layers) knn search, right? Will the generation speed become very slow?

noanti

How's the speed droping when length get large compare with vanilla llama?

11

How's the speed droping when length get large compare with vanilla llama?

lucasjinreal

Would LongNet be easily applied to the attention with FoT

1

https://arxiv.org/abs/2307.02486 Scaling to 1 billion context length paper in addition to this seems like it would solve the pursuit of infinite context length. Also FoT feels similar to L2P learn...

jebarpg

lokesh-iterate

0-shot long-context summarization / QA inference

4

Hi, Thank you for this great effort. I am trying to use your 3B m-instruct-v1_1 model to evaluate on my custom long-context QA dataset with context length up to 200k....

shi-kejian

long_llama
long_llama copied to clipboard

Metadata

How would you go about instruction finetuning?

FoT attention and the scaling trick

Finetuning code?

Does each token requires KNN search during inference?

How's the speed droping when length get large compare with vanilla llama?

Would LongNet be easily applied to the attention with FoT

Compared to RAG techniques

It's questionable whether the context window has truly been expanded？

Need clarification on token limit of input used for fine tuning

0-shot long-context summarization / QA inference

← Metadata

Owner

Metadata

long_llama long_llama copied to clipboard

Metadata

← Metadata

Owner

Metadata

long_llama
long_llama copied to clipboard