Szymon Tworkowski

Results 21 comments of Szymon Tworkowski

Hi, thanks for interest in our work! That's right, we are currently supporting only inference. We are considering releasing examples for finetuning of our models in pytorch/huggingface API.

The continued pretraining pipeline (used to train long_llama_3b base model) is based on EasyLM. We are planning to release instruction tuning code in pytorch & checkpoints & examples early next...

We are working on LongLLaMA v2, which will be a bigger release. After that we will release the pretraining code which is in JAX, based on EasyLM codebase - same...

In case you haven't seen, the instruction code is already there! see https://twitter.com/s_tworkowski/status/1687620785379360768 and READMEs in this repo for more details

We haven't compared inference time to longchat since we haven't tried 7b/13b longllama models - they are yet to come. The pros of using our approach is that long context...

Roughly speaking, both at training and inference time LongLLaMA uses only around 10% of the layers for long context. This means, we save about 80-90% of FLOPs spent on attention....

I'm not sure I understand the question. The experiment I will do is the following: we take input consisting of 128K tokens and feed it into both LLaMA 7B and...

What longllama checkpoint did you use? (there is base v1, base v1.1 and instruct) I agree longllama is a research preview and is not as competitive as closed source model...

Thank you for the clarification. Could you be more specific which checkpoint you have tried? I am afraid llama 3b base model does not have much of summarization capabilities, and...

Hi, thanks for your interest in our work! From my understanding of the LongNet paper, the main idea of FoT which is training on negative examples while utilizing longer context,...