llama
llama copied to clipboard
Inquiry about the maximum number of tokens that Llama can handle
I am wondering if there is a limit to the number of tokens that a Llama can handle in OpenAI's GPT models. I am planning to use the GPT models for a project that requires handling a large amount of text data, and I want to make sure that I don't exceed the maximum token limit that the Llama can handle.
I have searched the documentation, but I couldn't find any information on this topic. Therefore, I am hoping that someone from the OpenAI team can help me with this inquiry.
If there is a limit, can you please provide me with the details on the maximum number of tokens that a Llama can handle, and any suggestions on how to optimize my use of the GPT models to work within this limit?
Thank you very much for your assistance.
It was trained with 2048 tokens, so you can use up to that. If you want to use more tokens, you will need to fine-tune the model so that it supports longer sequences.
Can you actually just "fine tune more context size"?
@teknium1 how would one do so? I am a bit of a newbie here, is the process easy? Else I may find help elsewhere.
@teknium1 how would one do so? I am a bit of a newbie here, is the process easy? Else I may find help elsewhere.
That's why I asked because I too didn't know this was possible. After some research, it seems it probably is possible. I still don't know how to do it or what it would require though
@teknium1 I've done a bit of research, and what you really want to do is to increase the size of the "context window", this is effectively how many tokens the AI can view before it forgets. For ChatGPT4 this has been updated to roughly 8000 words. In order go above this, you have to completely retrain the model. This would require 10s of grands worth of equipment and is pretty much out of the reach of consumers. I was thinking about this, and it's possible to have a separate instance view chunks of conversation and then return this information to the main AI - a bit like how the mind works. Please bear in mind I am not an ML scientist, this seems extremely practical to me.
@glample do you have a source for the 2048 max token limit? I didn't find any info on this
I remember Meta saying 4096 at one point but I haven't tested this yet.
I'm hearing you can just raise the max sequence length and fine tune it on longer prompts
@glample do you have a source for the 2048 max token limit? I didn't find any info on this
https://huggingface.co/docs/transformers/main/en/model_doc/llama#transformers.LlamaConfig `max_position_embeddings'
Is there any parameter that needs to be optimized for the maximum length? It should just be that the training data has not seen a longer one, so the prediction result will be worse. When adding finetune, set a larger length and add relevant data, will the length be expanded?
Is there any parameter that needs to be optimized for the maximum length? It should just be that the training data has not seen a longer one, so the prediction result will be worse. When adding finetune, set a larger length and add relevant data, will the length be expanded?
See my tweet on this topic: https://twitter.com/Teknium1/status/1654446899859177472
thanks - let's add this to the documentation / FAQ as well. Closing for now as it's been answered..