llm.c
llm.c copied to clipboard
Adding CI check for exceeding loss tolerance
- Modified the Train (FP32) run with gpt2_124M.bin to add requested parameter changes in CI
- Added a check for the loss to see if it varies > 5 percent. This is configurable - so we can change this to a lower value if that's more appropriate.
Tested in CI
Sorry I don't understand the history/context for this change, is it following up on some conversation? Why are the args being changed around?
Yes, this was a suggestion in our discord conversation. Just replied to it there for your reference. Let me know if this is still of interest.
This is the output of the test in CI - it fails if it's isn't within the percent allowed. The Fixed Value on the left is out of test_gpt2.cu
Fixed Value: 5.270009, Read Value: 5.270006, Percent Difference: -0.00%
Fixed Value: 4.060681, Read Value: 4.060386, Percent Difference: -0.01%
Fixed Value: 3.320085, Read Value: 3.321317, Percent Difference: 0.04%
Fixed Value: 2.71755, Read Value: 2.718042, Percent Difference: 0.02%
Fixed Value: 2.181066, Read Value: 2.182476, Percent Difference: 0.06%
Fixed Value: 1.653923, Read Value: 1.654485, Percent Difference: 0.03%
Fixed Value: 1.16805, Read Value: 1.167975, Percent Difference: -0.01%
Fixed Value: 0.736873, Read Value: 0.736542, Percent Difference: -0.04%
Fixed Value: 0.401021, Read Value: 0.40138, Percent Difference: 0.09%
Fixed Value: 0.187493, Read Value: 0.188075, Percent Difference: 0.31%
Success: All values are within the allowed accuracy.