llm.c Adding CI check for exceeding loss tolerance

Modified the Train (FP32) run with gpt2_124M.bin to add requested parameter changes in CI
Added a check for the loss to see if it varies > 5 percent. This is configurable - so we can change this to a lower value if that's more appropriate.

Tested in CI

Jul 13 '24 08:07 rosslwheeler

Sorry I don't understand the history/context for this change, is it following up on some conversation? Why are the args being changed around?

Jul 13 '24 17:07 karpathy

Yes, this was a suggestion in our discord conversation. Just replied to it there for your reference. Let me know if this is still of interest.

This is the output of the test in CI - it fails if it's isn't within the percent allowed. The Fixed Value on the left is out of test_gpt2.cu

Fixed Value: 5.270009, Read Value: 5.270006, Percent Difference: -0.00%
Fixed Value: 4.060681, Read Value: 4.060386, Percent Difference: -0.01%
Fixed Value: 3.320085, Read Value: 3.321317, Percent Difference: 0.04%
Fixed Value: 2.71755, Read Value: 2.718042, Percent Difference: 0.02%
Fixed Value: 2.181066, Read Value: 2.182476, Percent Difference: 0.06%
Fixed Value: 1.653923, Read Value: 1.654485, Percent Difference: 0.03%
Fixed Value: 1.16805, Read Value: 1.167975, Percent Difference: -0.01%
Fixed Value: 0.736873, Read Value: 0.736542, Percent Difference: -0.04%
Fixed Value: 0.401021, Read Value: 0.40138, Percent Difference: 0.09%
Fixed Value: 0.187493, Read Value: 0.188075, Percent Difference: 0.31%
Success: All values are within the allowed accuracy.

Jul 13 '24 17:07 rosslwheeler