llm.c icon indicating copy to clipboard operation
llm.c copied to clipboard

Adding CI check for exceeding loss tolerance

Open rosslwheeler opened this issue 1 year ago • 2 comments

  • Modified the Train (FP32) run with gpt2_124M.bin to add requested parameter changes in CI
  • Added a check for the loss to see if it varies > 5 percent. This is configurable - so we can change this to a lower value if that's more appropriate.

Tested in CI

rosslwheeler avatar Jul 13 '24 08:07 rosslwheeler

Sorry I don't understand the history/context for this change, is it following up on some conversation? Why are the args being changed around?

karpathy avatar Jul 13 '24 17:07 karpathy

Yes, this was a suggestion in our discord conversation. Just replied to it there for your reference. Let me know if this is still of interest.

This is the output of the test in CI - it fails if it's isn't within the percent allowed. The Fixed Value on the left is out of test_gpt2.cu

Fixed Value: 5.270009, Read Value: 5.270006, Percent Difference: -0.00%
Fixed Value: 4.060681, Read Value: 4.060386, Percent Difference: -0.01%
Fixed Value: 3.320085, Read Value: 3.321317, Percent Difference: 0.04%
Fixed Value: 2.71755, Read Value: 2.718042, Percent Difference: 0.02%
Fixed Value: 2.181066, Read Value: 2.182476, Percent Difference: 0.06%
Fixed Value: 1.653923, Read Value: 1.654485, Percent Difference: 0.03%
Fixed Value: 1.16805, Read Value: 1.167975, Percent Difference: -0.01%
Fixed Value: 0.736873, Read Value: 0.736542, Percent Difference: -0.04%
Fixed Value: 0.401021, Read Value: 0.40138, Percent Difference: 0.09%
Fixed Value: 0.187493, Read Value: 0.188075, Percent Difference: 0.31%
Success: All values are within the allowed accuracy.

rosslwheeler avatar Jul 13 '24 17:07 rosslwheeler