LLMs-from-scratch
LLMs-from-scratch copied to clipboard
Expected all tensors to be on the same device
There appears to be an issue when running the code from chapter 6 (other sections not tested):
Error
Traceback (most recent call last):
File "/home/user/workspace/project/llm/tune_incl.py", line 359, in <module>
train_losses, val_losses, train_accs, val_accs, examples_seen = train_classifier_simple(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/workspace/project/llm/tune_incl.py", line 155, in train_classifier_simple
loss = calc_loss_batch(input_batch, target_batch, model, device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/workspace/project/llm/tune_incl.py", line 112, in calc_loss_batch
logits = model(input_batch)[:, -1, :] # Logits of last output token
^^^^^^^^^^^^^^^^^^
File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/workspace/project/llm/util.py", line 173, in forward
return logits
^^^^^
File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
Cause
I narrowed it down to this line:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/gpt-class-finetune.py#L398
This replaces a the output layer after the model was moved to the GPU and later on trigger this error.
Solution
This issue can be mitigated by adding this just after that statement:
[...]
num_classes = 2
model.out_head = torch.nn.Linear(in_features=BASE_CONFIG["emb_dim"], out_features=num_classes)
# add this to move all model parameters to GPU
model = model.to(device)
[...]