peft icon indicating copy to clipboard operation
peft copied to clipboard

Error happens when running examples with multi-gpu

Open imajiayu opened this issue 1 year ago • 2 comments

Expect to using DataParallel and DistributedDataParallel in examples/sequence_classification/LoRA.ipynb For DP, only add one line

model = torch.nn.DataParallel(model.cuda(),device_ids=[0,1], output_device=gpus[0])

get error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)

model: "bigscience/bloom-560m"

imajiayu avatar Apr 12 '23 13:04 imajiayu

Hello @imajiayu, could you try with the main branch
Screenshot 2023-04-14 at 5 05 41 PM

pacman100 avatar Apr 14 '23 12:04 pacman100

@pacman100 using peft0.3.0.dev0

DataParallel(
  (module): PeftModelForSequenceClassification(
    (base_model): LoraModel(
      (model): BloomForSequenceClassification(
        (transformer): BloomModel(
          (word_embeddings): Embedding(250880, 1024)
          (word_embeddings_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (h): ModuleList(
            (0-23): 24 x BloomBlock(
              (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (self_attention): BloomAttention(
                (query_key_value): Linear(
                  in_features=1024, out_features=3072, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=1024, out_features=8, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=8, out_features=3072, bias=False)
                  )
                )
                (dense): Linear(in_features=1024, out_features=1024, bias=True)
                (attention_dropout): Dropout(p=0.0, inplace=False)
              )
              (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (mlp): BloomMLP(
                (dense_h_to_4h): Linear(in_features=1024, out_features=4096, bias=True)
                (gelu_impl): BloomGelu()
                (dense_4h_to_h): Linear(in_features=4096, out_features=1024, bias=True)
              )
            )
          )
          (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        )
        (score): ModulesToSaveWrapper(
          (original_module): Linear(in_features=1024, out_features=2, bias=False)
          (modules_to_save): ModuleDict(
            (default): Linear(in_features=1024, out_features=2, bias=False)
          )
        )
      )
    )
  )
)

imajiayu avatar Apr 17 '23 03:04 imajiayu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar May 12 '23 15:05 github-actions[bot]

I also face this problem, how to solve it?

shangqing-liu avatar May 15 '23 15:05 shangqing-liu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Jun 09 '23 15:06 github-actions[bot]

I also face this problem, how to solve it?

i have same problem on inference ... i worked around it by adding

device = "cuda"
inputs = tokenizer(sample, truncation=True, max_length = 512, padding="longest", return_tensors="pt").input_ids.to(device)

but then i run into other inexplicable issue on inference ... ooff

outputs = model(**inputs).logits
argument after ** must be a mapping, not Tensor

maadnfritz avatar Jun 15 '23 23:06 maadnfritz

@maadnfritz can you try:

outputs = model(input_ids=inputs).logits

I will close this issue as it seems solved. Feel free to open a new ticket for your issue with a small reproducible snippet! Thanks

younesbelkada avatar Jun 21 '23 15:06 younesbelkada