llm.c Fix the potential int overflow related to B*T*V

With

    int B = 8;
    int T = 1024;
    int V = 50257;

The ratio of (8 * 1024 * 52507) to INT_MAX is: 0.200298. Therefore, if we make B = 64, B * T * V overflows.

Apr 16 '24 20:04 lancerts

64 bit indexing used to be a lot slower than 32 bit indexing (to the point that PyTorch ships two kernels for some things), so you might also want to check the perf impact. (I don't know if it happens in your case/with more recent GPUs, but I had an encounter that surprised me a lot with that many years ago.)

Apr 17 '24 06:04 t-vi

In lines like const size_t N = (size_t)(B) * T * V; is the explicit cast needed?

Apr 17 '24 20:04 karpathy

Yes, assuming there is an overflow possibility - the values on the right are int's so you'll want to upcast them to size_t (long int). Otherwise, the behavior isn't specified by the C standard on overflow. Based on a unit test I just ran, the compiler did NOT upcast and truncated the value on overflow so the size_t cast on the right is needed.

The only question I have is whether B needs to be (B)? It seems to not matter if you leave it B?

Apr 17 '24 21:04 rosslwheeler

In lines like const size_t N = (size_t)(B) * T * V; is the explicit cast needed?

    int B = 64;
    int T = 1024;
    int V = 50257;


    size_t N1 = B * T * V;
    size_t N2 = (size_t)(B) * T * V;       // correct version 
    size_t N22 = (size_t) B * T * V;       // correct version 
    int N3 = B * T * V;
    size_t N4 = (size_t) N3;

    printf("N1: %zu\n", N1);
    printf("N2: %zu\n", N2);
    printf("N22: %zu\n", N22);
    printf("N3: %d\n", N3);
    printf("N4: %zu\n", N4);

N1: 18446744072708227072                                                                                                        
N2: 3293642752                                                                                                                  
N22: 3293642752
N3: -1001324544
N4: 18446744072708227072

If we dont cast explicitly, i.e., N1 is basically first compute in the int range then cast to size_t, which is N4, this is wrong. @rosslwheeler as both N2 and N22 are correct so I think the () does not matter, but I'd like to keep explicit cast on B explicit.

Apr 17 '24 21:04 lancerts

@lancerts Is there an inherent speed improvement using int's vs. long int? (Yes, I saw the comment from t-vi above) Just curious.

Apr 17 '24 21:04 rosslwheeler

Sorry I meant the casts look ugly to my eye. Maybe we could make the individual params size_t in the function declarations 🤔, so their products will come out size_t as well. hmm

Apr 17 '24 22:04 karpathy

Ah - was asking the same above later too. Can we just use size_t or long int in the variable definitions instead of int to get rid of the casting? Or is there speed impact?

Apr 17 '24 22:04 rosslwheeler

Sorry I meant the casts look ugly to my eye. Maybe we could make the individual params size_t in the function declarations 🤔, so their products will come out size_t as well. hmm

Agreed, B, T, V themselves are most likely within int range, don't think they will overflow individually in meaningful setup so didn't change to size_t for individual parameter.

The concern is, if we make them all size_t, does it slow down kernels in actual training (simple setup seems OK but I cannot be sure for general settings, i.e., the actual GPT training step)..

Apr 17 '24 22:04 lancerts

llm.c
llm.c copied to clipboard

Fix the potential int overflow related to BTV

llm.c llm.c copied to clipboard

Fix the potential int overflow related to B*T*V

llm.c
llm.c copied to clipboard

Fix the potential int overflow related to BTV