InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

[Bug] RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

Open GuiQuQu opened this issue 5 months ago • 0 comments

Checklist

  • [X] 1. I have searched related issues but cannot get the expected help.
  • [X] 2. The bug has not been fixed in the latest version.
  • [X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

the error is in the forward function of InternVLChatModel

        B, N, C = input_embeds.shape
        input_embeds = input_embeds.reshape(B * N, C)

        if torch.distributed.get_rank() == 0:
            print(f'dynamic ViT batch size: {vit_batch_size}, images per sample: {vit_batch_size / B}, dynamic token length: {N}')
        # breakpoint()
        input_ids = input_ids.reshape(B * N)
        selected = (input_ids == self.img_context_token_id)
        try:
            # ERROR in here
            input_embeds[selected] = input_embeds[selected] * 0.0  + vit_embeds.reshape(-1, C)
        except Exception as e:
            vit_embeds = vit_embeds.reshape(-1, C)
            print(f'warning: {e}, input_embeds[selected].shape={input_embeds[selected].shape}, '
                  f'vit_embeds.shape={vit_embeds.shape}')
            n_token = selected.sum()
            input_embeds[selected] = input_embeds[selected] * 0.0 + vit_embeds[:n_token]

        input_embeds = input_embeds.reshape(B, N, C)

input_embeds[selected] = input_embeds[selected] * 0.0 + vit_embeds.reshape(-1, C) report the error.

The error means that the leaf Variable ' input_embeds[selected]' share the same address with ' input_embeds[selected] * 0.0' But there is no error when I train in full parameters mode.

Reproduction

I use the intervl2-2b version model I train the lora model in RTX 4090 my lora config: "use_backbone_lora":0, "use_llm_lora":64, "freeze_vision_model": true, "freeze_llm_model":true, "frzzze_mlp":true,

Environment

transformers==4.44.2
torch==2.1.2
torchvision==0.16.2 
torchaudio==2.1.2
einops
tensorboard
accelerate==0.34.2
numpy<=1.26.4
deepspeed==0.15.1
peft
timm
ipykernel
sentencepiece
ninja
flash-attn

Error traceback

No response

GuiQuQu avatar Sep 15 '24 17:09 GuiQuQu