TimeLlama
TimeLlama copied to clipboard
Decoding Parameters
I am trying to use TimeLlama-7b-chat, loading it from HuggingFace.
I am using the following code to load the model:
model_name = "chrisyuan45/TimeLlama-7b-chat"
quantization_config = BitsAndBytesConfig.from_dict({
'load_in_4bit': True,
'bnb_4bit_compute_dtype': torch.float16,
'bnb_4bit_quant_type': 'nf4',
'bnb_4bit_use_double_quant':True})
model = LlamaForCausalLM.from_pretrained(
model_name,
return_dict=True,
quantization_config = quantization_config,
device_map="auto",
low_cpu_mem_usage=True)
tokenizer = LlamaTokenizer.from_pretrained(model_name)
and the following for generation:
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('cuda')
ids = model.generate(input_ids,
max_length=200,
num_return_sequences=3,
no_repeat_ngram_size=2)
output = [tokenizer.decode(ids[i], skip_special_tokens=True) for i in range(len(ids))]
So far, the outputs are not very good, e.g. incomplete sentences or special characters. Could you please provide the decoding parameters that you used for your experiments? Thank you in advance!