tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

[BUG] - The test error is consistently equal to 0 when I execute the Quickstart tutorial on the M2 ProMax.

Open alphagw opened this issue 1 year ago • 2 comments

Add Link

beginner_source/basics/quickstart_tutorial.py

Describe the bug

When executing the Quickstart tutorial on the M2 ProMax, the test error consistently remains at 0.

The code is exactly the same as this text, nothing has been modified.

https://github.com/pytorch/tutorials/blob/main/beginner_source/basics/quickstart_tutorial.py

Error message:

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Using mps device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
Epoch 1
-------------------------------
loss: 2.311726  [   64/60000]
loss: 2.294588  [ 6464/60000]
loss: 2.276271  [12864/60000]
loss: 2.260946  [19264/60000]
loss: 2.237525  [25664/60000]
loss: 2.206861  [32064/60000]
loss: 2.221519  [38464/60000]
loss: 2.183903  [44864/60000]
loss: 2.185328  [51264/60000]
loss: 2.132372  [57664/60000]
Test Error: 
 Accuracy: 0.0%, Avg loss: 2.145295 

Epoch 2
-------------------------------
loss: 2.162894  [   64/60000]
loss: 2.153250  [ 6464/60000]
loss: 2.093136  [12864/60000]
loss: 2.102805  [19264/60000]
loss: 2.044294  [25664/60000]
loss: 1.984252  [32064/60000]
loss: 2.014595  [38464/60000]
loss: 1.927812  [44864/60000]
loss: 1.940063  [51264/60000]
loss: 1.844966  [57664/60000]
Test Error: 
 Accuracy: 0.0%, Avg loss: 1.863020 

Epoch 3
-------------------------------
loss: 1.903860  [   64/60000]
loss: 1.871587  [ 6464/60000]
loss: 1.749879  [12864/60000]
loss: 1.784963  [19264/60000]
loss: 1.671392  [25664/60000]
loss: 1.630241  [32064/60000]
loss: 1.649708  [38464/60000]
loss: 1.547975  [44864/60000]
loss: 1.578283  [51264/60000]
loss: 1.459987  [57664/60000]
Test Error: 
 Accuracy: 0.0%, Avg loss: 1.493550 

Epoch 4
-------------------------------
loss: 1.567270  [   64/60000]
loss: 1.531002  [ 6464/60000]
loss: 1.378248  [12864/60000]
loss: 1.449183  [19264/60000]
loss: 1.325287  [25664/60000]
loss: 1.331857  [32064/60000]
loss: 1.349332  [38464/60000]
loss: 1.269045  [44864/60000]
loss: 1.308438  [51264/60000]
loss: 1.205329  [57664/60000]
Test Error: 
 Accuracy: 0.0%, Avg loss: 1.238385 

Epoch 5
-------------------------------
loss: 1.321056  [   64/60000]
loss: 1.299582  [ 6464/60000]
loss: 1.131391  [12864/60000]
loss: 1.237661  [19264/60000]
loss: 1.105717  [25664/60000]
loss: 1.140301  [32064/60000]
loss: 1.168805  [38464/60000]
loss: 1.099758  [44864/60000]
loss: 1.142098  [51264/60000]
loss: 1.057198  [57664/60000]
Test Error: 
 Accuracy: 0.0%, Avg loss: 1.082316 

Done!
Saved PyTorch Model State to model.pth

Traceback (most recent call last):
  File "/*****/quick_start.py", line 430, in <module>
    run_github()
  File "/*****//quick_start.py", line 420, in run_github
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
IndexError: list index out of range

Process finished with exit code 1

Describe your environment

  • MacOS
  • torch version: 2.0.1
  • no cuda, torch.device = 'mps'

cc @suraj813

alphagw avatar Jun 30 '23 08:06 alphagw

When I forcefully changed the device to CPU, everything worked fine.

alphagw avatar Jun 30 '23 08:06 alphagw

Interesting, @AlphaGJW can you try PyTorch-2.1? Though to be frank, I can not reproduce the failure using neither pytorch-2.0.1 nor pytorch-2.1.0 using my Mac M2 Pro (but not ProMax) running Sonoma. Perhaps that's an interesting info to include

malfet avatar Oct 26 '23 16:10 malfet