codellama
codellama copied to clipboard
What is the max length could the codellama-2-7B generate?
I was doing an inference work using codellama-2-7B.
Here is my code:
inputs_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(self.device)
generate_ids=model.generate(inputs_ids,max_new_tokens=1024,num_return_sequences=1,pad_token_id=tokenizer.eos_token_id)
output = tokenizer.decode(generate_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
I want to know the maximum that max_new_token
can be set to?
This base model can produce total 4096 tokens. You can set max_new_token
to 4096. This 4096 tokens are including number of tokens of the prompts as well.
Hi @Uestc-Young, please note that since generation is auto-regressive, the maximum length for generation is the maximum sequence length supported minus the length of the prompt. There is a max_seq_len
argument that you can specify when you build the model, and you can set this parameter to up to 100000 (but, depending on your GPU, you may run into memory issues and hence go with a lower value).