rosslwheeler
rosslwheeler
This one? int check_tensor(float *a, float *b, int n, const char* label) { this one generates a couple of warnings test_gpt2.cu(139): warning #2464-D: conversion from a string literal to "char...
Yes, assuming there is an overflow possibility - the values on the right are int's so you'll want to upcast them to size_t (long int). Otherwise, the behavior isn't specified...
@lancerts Is there an inherent speed improvement using int's vs. long int? (Yes, I saw the comment from t-vi above) Just curious.
Ah - was asking the same above later too. Can we just use size_t or long int in the variable definitions instead of int to get rid of the casting?...
My issue with adding pragma's to source files (OpenMP excluded) is that you will keep adding more per platform/compiler. One suggestion was to split this function off into its own...
@azret - what about using C11 Atomics?
Sure - will let you know.
@azret - didn't get any speedup - was it supposed to? This is the same as your code above (basically). This will need the MSVC OpenMP changes in the source...
@jonathanmarvens - no worries. We're just experimenting at this point.
You forgot to move the int's out of the loop :-) They are the same...1045264.5000006 1045264.5000006