pytorch-deep-learning icon indicating copy to clipboard operation
pytorch-deep-learning copied to clipboard

Unable to get Reproducibilty of Tensors on GPU | Exercises | PyTorch Fundamentals

Open ashuRMS opened this issue 1 year ago • 3 comments

When using torch.cuda.manual_seed_all(1234), I cannot produce the same tensor on GPU. I also tried using torch.backends.cudnn.benchmark = False but it did not work.

I could only get reproducible by using only torch.manual_seed(1234).

Why does the manual seeding not work on CUDA? What mistake am I making?

aas

ashuRMS avatar Dec 18 '23 17:12 ashuRMS

I think by default the tensors are created in CPU and you have used only torch.cuda.manual_seed and not torch.manual_seed. so the tensors created have become random. Try it once and let me know.

bhuvanmdev avatar Dec 22 '23 05:12 bhuvanmdev

I believe I'm experiencing the same problem. Here is a short way to see the problem in action:

import torch

print(torch.__version__)
for device in ["cpu", "cuda"]:
  with torch.device(device):
    torch.manual_seed(42)
    model = torch.nn.Sequential(
      torch.nn.Linear(1, 1)
    )
    print(model.state_dict())

The results are (no matter how many times you run it):

2.1.0+cu121
OrderedDict([('0.weight', tensor([[0.7645]])), ('0.bias', tensor([0.8300]))])
OrderedDict([('0.weight', tensor([[0.2259]], device='cuda:0')), ('0.bias', tensor([0.9754], device='cuda:0'))])

I expect the default values in both cases to be the same. Because of that, the CPU version converges, and I get better predictions, but with GPU, it is faster but inaccurate (due to the default seed).

In my actual script, I initialize things as follows (which end up in similar results):

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
torch.set_default_device(device)

My understanding is that when using set_default_device, I don't have to call .to(device) or pass the device explicitly as a parameter, and whatever I use will be the new default.

I'm open to suggestions, although I found that if the seed behavior is expected to differ depending on the device kind, adding a momentum parameter to the optimizer improves the results better than changing the seed.

agalue avatar Jan 13 '24 13:01 agalue

The problem your are facing is not related to torch.cuda.manual_seed() because you are creating your Tensor on the cpu and moving it to the gpu. Another note is that if you want both tensor to be the same you have to call the manual_seed twice. If you call it once both tensor will be different but each of them will be the same each time you run it. I leave code below:

  1. This way both tensor are the same because I called to manual_seed before creating each one:
RANDOM_SEED = 1234

torch.manual_seed(RANDOM_SEED)

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device {device}\n")

cpu_tensor_1 = torch.rand(2, 3)
gpu_tensor_1 = cpu_tensor_1.to(device)

torch.manual_seed(RANDOM_SEED)

cpu_tensor_2 = torch.rand(2, 3)
gpu_tensor_2 = cpu_tensor_2.to(device)

cpu_tensor_1, cpu_tensor_2

Output:

Device cuda

(tensor([[0.0290, 0.4019, 0.2598],
         [0.3666, 0.0583, 0.7006]]),
 tensor([[0.0290, 0.4019, 0.2598],
         [0.3666, 0.0583, 0.7006]]))
  1. Now the tensors are different but the will be the same each time you run it:
RANDOM_SEED = 1234

torch.manual_seed(RANDOM_SEED)

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device {device}\n")

cpu_tensor_1 = torch.rand(2, 3)
gpu_tensor_1 = cpu_tensor_1.to(device)

cpu_tensor_2 = torch.rand(2, 3)
gpu_tensor_2 = cpu_tensor_2.to(device)

cpu_tensor_1, cpu_tensor_2

Output:

Device cuda

(tensor([[0.0290, 0.4019, 0.2598],
         [0.3666, 0.0583, 0.7006]]),
 tensor([[0.0518, 0.4681, 0.6738],
         [0.3315, 0.7837, 0.5631]]))

I hope it works.

ErikSarriegui avatar Jan 29 '24 10:01 ErikSarriegui