accelerate
accelerate copied to clipboard
ValueError: prompt is on the meta device, we need a `value` to put in on 0.
We newly defined the prompt tensor in the LlamaModel class:
self.prompt = torch.nn.parameter.Parameter(torch.randn(self.embed_dim), requires_grad=True),
when loading LLAMA weights:
model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=True,
torch_dtype=torch.float16,
device_map=device_map,
)
ValueError: prompt is on the meta device, we need a value to put in on 0.
Not sure what the issue is here. You have a new tensor and not corresponding weight in the checkpoint so it does not work.
Because we added a new tensor with no corresponding weight in the original llama checkpoint, this new tensor needs to be initialized randomly during training. When we load the model with: model = LlamaForCausalLM.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, ) print tensor: tensor([-0.1156, 0.5072, -0.8639, ..., 0.0728, -1.5193, -0.7287],requires_grad=True);. OK, no problem But, when we load the model with: model = LlamaForCausalLM.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, load_in_8bit=True, torch_dtype=torch.float16, device_map=device_map, ) print tensor: tensor(..., device='meta', size=(4096,), requires_grad=True) So, ValueError: tensor is on the meta device, we need a value to put in on 0.
this new tensor needs to be initialized randomly during training
So make sure you properly intialize that weight in the _init_weights function of your custom model.
The tensor we are adding is: self.prompt = torch.nn.parameter.Parameter(torch.randn(self.embed_dim), requires_grad=True) has been initialized.
ValueError: prompt is on the meta device, we need a value to put in on 0.