pytorch-deep-learning
pytorch-deep-learning copied to clipboard
Unable to get Reproducibilty of Tensors on GPU | Exercises | PyTorch Fundamentals
When using torch.cuda.manual_seed_all(1234)
, I cannot produce the same tensor on GPU. I also tried using torch.backends.cudnn.benchmark = False
but it did not work.
I could only get reproducible by using only torch.manual_seed(1234)
.
Why does the manual seeding not work on CUDA? What mistake am I making?
I think by default the tensors are created in CPU and you have used only torch.cuda.manual_seed
and not torch.manual_seed
. so the tensors created have become random. Try it once and let me know.
I believe I'm experiencing the same problem. Here is a short way to see the problem in action:
import torch
print(torch.__version__)
for device in ["cpu", "cuda"]:
with torch.device(device):
torch.manual_seed(42)
model = torch.nn.Sequential(
torch.nn.Linear(1, 1)
)
print(model.state_dict())
The results are (no matter how many times you run it):
2.1.0+cu121
OrderedDict([('0.weight', tensor([[0.7645]])), ('0.bias', tensor([0.8300]))])
OrderedDict([('0.weight', tensor([[0.2259]], device='cuda:0')), ('0.bias', tensor([0.9754], device='cuda:0'))])
I expect the default values in both cases to be the same. Because of that, the CPU version converges, and I get better predictions, but with GPU, it is faster but inaccurate (due to the default seed).
In my actual script, I initialize things as follows (which end up in similar results):
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
torch.set_default_device(device)
My understanding is that when using set_default_device,
I don't have to call .to(device)
or pass the device
explicitly as a parameter, and whatever I use will be the new default.
I'm open to suggestions, although I found that if the seed behavior is expected to differ depending on the device kind, adding a momentum
parameter to the optimizer
improves the results better than changing the seed.
The problem your are facing is not related to torch.cuda.manual_seed()
because you are creating your Tensor on the cpu and moving it to the gpu. Another note is that if you want both tensor to be the same you have to call the manual_seed
twice. If you call it once both tensor will be different but each of them will be the same each time you run it. I leave code below:
- This way both tensor are the same because I called to
manual_seed
before creating each one:
RANDOM_SEED = 1234
torch.manual_seed(RANDOM_SEED)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device {device}\n")
cpu_tensor_1 = torch.rand(2, 3)
gpu_tensor_1 = cpu_tensor_1.to(device)
torch.manual_seed(RANDOM_SEED)
cpu_tensor_2 = torch.rand(2, 3)
gpu_tensor_2 = cpu_tensor_2.to(device)
cpu_tensor_1, cpu_tensor_2
Output:
Device cuda
(tensor([[0.0290, 0.4019, 0.2598],
[0.3666, 0.0583, 0.7006]]),
tensor([[0.0290, 0.4019, 0.2598],
[0.3666, 0.0583, 0.7006]]))
- Now the tensors are different but the will be the same each time you run it:
RANDOM_SEED = 1234
torch.manual_seed(RANDOM_SEED)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device {device}\n")
cpu_tensor_1 = torch.rand(2, 3)
gpu_tensor_1 = cpu_tensor_1.to(device)
cpu_tensor_2 = torch.rand(2, 3)
gpu_tensor_2 = cpu_tensor_2.to(device)
cpu_tensor_1, cpu_tensor_2
Output:
Device cuda
(tensor([[0.0290, 0.4019, 0.2598],
[0.3666, 0.0583, 0.7006]]),
tensor([[0.0518, 0.4681, 0.6738],
[0.3315, 0.7837, 0.5631]]))
I hope it works.