Szymon Tworkowski comments

Results 21 comments of


                                            Szymon Tworkowski

Finetuning code?

Hi, thanks for interest in our work! That's right, we are currently supporting only inference. We are considering releasing examples for finetuning of our models in pytorch/huggingface API.

Finetuning code?

The continued pretraining pipeline (used to train long_llama_3b base model) is based on EasyLM. We are planning to release instruction tuning code in pytorch & checkpoints & examples early next...

Finetuning code?

We are working on LongLLaMA v2, which will be a bigger release. After that we will release the pretraining code which is in JAX, based on EasyLM codebase - same...

Finetuning code?

In case you haven't seen, the instruction code is already there! see https://twitter.com/s_tworkowski/status/1687620785379360768 and READMEs in this repo for more details

How's the speed droping when length get large compare with vanilla llama?

We haven't compared inference time to longchat since we haven't tried 7b/13b longllama models - they are yet to come. The pros of using our approach is that long context...

How's the speed droping when length get large compare with vanilla llama?

Roughly speaking, both at training and inference time LongLLaMA uses only around 10% of the layers for long context. This means, we save about 80-90% of FLOPs spent on attention....

How's the speed droping when length get large compare with vanilla llama?

I'm not sure I understand the question. The experiment I will do is the following: we take input consisting of 128K tokens and feed it into both LLaMA 7B and...

How's the speed droping when length get large compare with vanilla llama?

What longllama checkpoint did you use? (there is base v1, base v1.1 and instruct) I agree longllama is a research preview and is not as competitive as closed source model...

How's the speed droping when length get large compare with vanilla llama?

Thank you for the clarification. Could you be more specific which checkpoint you have tried? I am afraid llama 3b base model does not have much of summarization capabilities, and...

Would LongNet be easily applied to the attention with FoT

Hi, thanks for your interest in our work! From my understanding of the LongNet paper, the main idea of FoT which is training on negative examples while utilizing longer context,...