[BUG]: `LazyInitContext` with Baichuan2 caused `loss=LazyTensor([])`

Open eiPI1-0 opened this issue 2 years ago • 0 comments

🐛 Describe the bug

When training Baichuan2 model with gemini plugin and using LazyInitContext, the output logit is LazyTensor and the loss is LazyTensor([]), which caused "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn".

To Reproduce

init_ctx = (
    LazyInitContext(default_device=get_current_device()) if isinstance(plugin, GeminiPlugin) else nullcontext()
)

with init_ctx:
    model_path = "baichuan-inc/Baichuan2-7B-Chat"
    if model_path:
        model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
        if args.freeze_non_embeds_params:
            freeze_non_embeds_parameters(model=model)

Expected behavior

The output should be ColoTensor as Baichuan1 does.

Environment

colossalai==0.3.4 python==3.9.18 cuda==11.7 torch==2.0.1+cu117 transformers==4.33.1 flash-attn==2.3.2 xformers==0.0.21

Nov 02 '23 07:11 eiPI1-0