llama icon indicating copy to clipboard operation
llama copied to clipboard

Inquiry about the maximum number of tokens that Llama can handle

Open magicknight opened this issue 1 year ago • 11 comments

I am wondering if there is a limit to the number of tokens that a Llama can handle in OpenAI's GPT models. I am planning to use the GPT models for a project that requires handling a large amount of text data, and I want to make sure that I don't exceed the maximum token limit that the Llama can handle.

I have searched the documentation, but I couldn't find any information on this topic. Therefore, I am hoping that someone from the OpenAI team can help me with this inquiry.

If there is a limit, can you please provide me with the details on the maximum number of tokens that a Llama can handle, and any suggestions on how to optimize my use of the GPT models to work within this limit?

Thank you very much for your assistance.

magicknight avatar Mar 07 '23 12:03 magicknight

It was trained with 2048 tokens, so you can use up to that. If you want to use more tokens, you will need to fine-tune the model so that it supports longer sequences.

glample avatar Mar 08 '23 00:03 glample

Can you actually just "fine tune more context size"?

teknium1 avatar Mar 14 '23 04:03 teknium1

@teknium1 how would one do so? I am a bit of a newbie here, is the process easy? Else I may find help elsewhere.

couldbejake avatar Mar 27 '23 07:03 couldbejake

@teknium1 how would one do so? I am a bit of a newbie here, is the process easy? Else I may find help elsewhere.

That's why I asked because I too didn't know this was possible. After some research, it seems it probably is possible. I still don't know how to do it or what it would require though

teknium1 avatar Mar 27 '23 10:03 teknium1

@teknium1 I've done a bit of research, and what you really want to do is to increase the size of the "context window", this is effectively how many tokens the AI can view before it forgets. For ChatGPT4 this has been updated to roughly 8000 words. In order go above this, you have to completely retrain the model. This would require 10s of grands worth of equipment and is pretty much out of the reach of consumers. I was thinking about this, and it's possible to have a separate instance view chunks of conversation and then return this information to the main AI - a bit like how the mind works. Please bear in mind I am not an ML scientist, this seems extremely practical to me.

couldbejake avatar Mar 27 '23 13:03 couldbejake

@glample do you have a source for the 2048 max token limit? I didn't find any info on this

MoritzLaurer avatar Apr 05 '23 20:04 MoritzLaurer

I remember Meta saying 4096 at one point but I haven't tested this yet.

Delcos avatar Apr 08 '23 22:04 Delcos

I'm hearing you can just raise the max sequence length and fine tune it on longer prompts

teknium1 avatar Apr 13 '23 00:04 teknium1

@glample do you have a source for the 2048 max token limit? I didn't find any info on this

https://huggingface.co/docs/transformers/main/en/model_doc/llama#transformers.LlamaConfig `max_position_embeddings'

SKRohit avatar Apr 17 '23 18:04 SKRohit

Is there any parameter that needs to be optimized for the maximum length? It should just be that the training data has not seen a longer one, so the prediction result will be worse. When adding finetune, set a larger length and add relevant data, will the length be expanded?

yuye2133 avatar May 06 '23 02:05 yuye2133

Is there any parameter that needs to be optimized for the maximum length? It should just be that the training data has not seen a longer one, so the prediction result will be worse. When adding finetune, set a larger length and add relevant data, will the length be expanded?

See my tweet on this topic: https://twitter.com/Teknium1/status/1654446899859177472

teknium1 avatar May 06 '23 06:05 teknium1

thanks - let's add this to the documentation / FAQ as well. Closing for now as it's been answered..

jspisak avatar Sep 06 '23 18:09 jspisak