Problem: megerging chatglm3 with samples, it gives out error
Traceback (most recent call last):
File "~/llm_cocktail/mix_mdl.py", line 67, in
model2 = mix_models_with_data(
File "~/miniconda3/envs/train_py310/lib/python3.10/site-packages/LM_Cocktail/cocktail.py", line 102, in mix_models_with_data
weights = compute_weights(model, tokenizer=tokenizer, param_list=param_list, model_type=model_type,
File "~/miniconda3/envs/train_py310/lib/python3.10/site-packages/LM_Cocktail/utils.py", line 135, in compute_weights
loss = loss_func(base_model=base_model, input_data=input_data)
File "~/miniconda3/envs/train_py310/lib/python3.10/site-packages/LM_Cocktail/utils.py", line 230, in llm_loss
output = base_model(**data)
File "~/miniconda3/envs/train_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "~/miniconda3/envs/train_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "~/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 937, in forward
transformer_outputs = self.transformer(
File "~/miniconda3/envs/train_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "~/miniconda3/envs/train_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "~/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 819, in forward
full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask)
File "~/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 690, in get_masks
full_attention_mask -= padding_mask.unsqueeze(-1) - 1
RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead.