alpaca-lora
alpaca-lora copied to clipboard
evaluation loss is NaN
When I finetuning the alpaca-alora model, I applied the alora modules on attention layers" {q_proj, v_proj}", and received the evaluation loss as NaN. However, if I applied the alora moduyles on attention layers "{q_proj, v_proj, o_proj, k_proj}", then the evaluations loss becomes normal. I am not sure why this is happening?
I have the same question, have you figured it out? @victorzhz111
Not yet, when I applied the alora modules on 4 attention weights in 13B model, the eval loss is NaN again. @LiuPearl1
Not yet, when I applied the alora modules on 4 attention weights in 13B model, the eval loss is NaN again. @LiuPearl1
@victorzhz111 Your GPU is V100?I found that V100 always met this problem. But I don't know why.
@LiuPearl1 Yes, maybe it is not amp architecture.
I meet this question, too. My GPU is V100
Im also using V100 Currently i modified the loading model as follow, notice the torch type
model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=False,
torch_dtype=torch.float32,
llm_int8_skip_modules=FULL_FINETUNE_MODULES,
device_map=device_map,
cache_dir='../huggingface'
)
and add this
model = get_peft_model(model, config).to(torch.float32)
my training loss currently not 0.0 yet so i think this is the fix for V100 but the training speed is very slow