luminal Accuracy is low for examples/train_math

cargo run --release --features cuda

Iter 20649 Loss: 6.49 Acc: 0.17
Iter 20650 Loss: 6.48 Acc: 0.17
Iter 20651 Loss: 6.49 Acc: 0.17
Iter 20652 Loss: 6.48 Acc: 0.17
Iter 20653 Loss: 6.49 Acc: 0.17
Iter 20654 Loss: 6.48 Acc: 0.17
Iter 20655 Loss: 6.47 Acc: 0.17
Iter 20656 Loss: 6.48 Acc: 0.17
Iter 20657 Loss: 6.48 Acc: 0.17
Iter 20658 Loss: 6.48 Acc: 0.17
Iter 20659 Loss: 6.48 Acc: 0.17
Iter 20660 Loss: 6.48 Acc: 0.17
Iter 20661 Loss: 6.47 Acc: 0.17
Iter 20662 Loss: 6.47 Acc: 0.17
Iter 20663 Loss: 6.47 Acc: 0.17

May 05 '24 07:05 npuichigo

Agreed I'm seeing the same thing. Will fix

May 05 '24 14:05 jafioti

I've added a small PR with a temporary fix, which may indicate how the Cuda training (in general) is going wrong.

Jun 24 '24 17:06 swfsql

Hmm very interesting, your changes trigger a copy-back of the data to cpu, rather than keep it on gpu. I wonder why that makes it accurate. Sorry I haven't gotten around to looking at this in-depth. I will have time this weekend to check it out, and access to a cuda machine.

Jun 25 '24 03:06 jafioti

Could it be that the initial CudaCopyToDevice calls (made at the start of every iteration) are always overwriting the latest GPU weight values with the (static, initial) CPU weight values?

Jun 25 '24 15:06 swfsql

I don't think so, ops don't get ran if the destination tensor is already produced, so the copy to device shouldn't be ran as long as the cuda buffers weren't getting deleted first

Jun 28 '24 07:06 jafioti

Accuracy is low for examples/train_math_net with cuda