LLMs-from-scratch icon indicating copy to clipboard operation
LLMs-from-scratch copied to clipboard

Expected all tensors to be on the same device

Open TimRepke opened this issue 1 year ago • 0 comments

There appears to be an issue when running the code from chapter 6 (other sections not tested):

Error

Traceback (most recent call last):
  File "/home/user/workspace/project/llm/tune_incl.py", line 359, in <module>
    train_losses, val_losses, train_accs, val_accs, examples_seen = train_classifier_simple(
                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/tune_incl.py", line 155, in train_classifier_simple
    loss = calc_loss_batch(input_batch, target_batch, model, device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/tune_incl.py", line 112, in calc_loss_batch
    logits = model(input_batch)[:, -1, :]  # Logits of last output token
             ^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/util.py", line 173, in forward
    return logits
             ^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Cause

I narrowed it down to this line:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/gpt-class-finetune.py#L398

This replaces a the output layer after the model was moved to the GPU and later on trigger this error.

Solution

This issue can be mitigated by adding this just after that statement:

[...]
num_classes = 2
model.out_head = torch.nn.Linear(in_features=BASE_CONFIG["emb_dim"], out_features=num_classes)
# add this to move all model parameters to GPU
model = model.to(device)
[...]

TimRepke avatar May 22 '24 18:05 TimRepke