ColossalAI
ColossalAI copied to clipboard
[BUG]: `LazyInitContext` with Baichuan2 caused `loss=LazyTensor([])`
🐛 Describe the bug
When training Baichuan2 model with gemini plugin and using LazyInitContext, the output logit is LazyTensor and the loss is LazyTensor([]), which caused "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn".
To Reproduce
init_ctx = (
LazyInitContext(default_device=get_current_device()) if isinstance(plugin, GeminiPlugin) else nullcontext()
)
with init_ctx:
model_path = "baichuan-inc/Baichuan2-7B-Chat"
if model_path:
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
if args.freeze_non_embeds_params:
freeze_non_embeds_parameters(model=model)
Expected behavior
The output should be ColoTensor as Baichuan1 does.
Environment
colossalai==0.3.4 python==3.9.18 cuda==11.7 torch==2.0.1+cu117 transformers==4.33.1 flash-attn==2.3.2 xformers==0.0.21