ggml
ggml copied to clipboard
ggml : add macros for accessing tensors to reduce code duplication
The following pattern is repeated extensively throughout ggml.c:
const int64_t ne00 = src0->ne[0];
const int64_t ne01 = src0->ne[1];
const int64_t ne02 = src0->ne[2];
const int64_t ne03 = src0->ne[3];
const int64_t ne10 = src1->ne[0];
const int64_t ne11 = src1->ne[1];
const int64_t ne12 = src1->ne[2];
const int64_t ne13 = src1->ne[3];
const int64_t ne0 = dst->ne[0];
const int64_t ne1 = dst->ne[1];
const int64_t ne2 = dst->ne[2];
const int64_t ne3 = dst->ne[3];
const int nb00 = src0->nb[0];
const int nb01 = src0->nb[1];
const int nb02 = src0->nb[2];
const int nb03 = src0->nb[3];
const int nb10 = src1->nb[0];
const int nb11 = src1->nb[1];
const int nb12 = src1->nb[2];
const int nb13 = src1->nb[3];
const int nb0 = dst->nb[0];
const int nb1 = dst->nb[1];
const int nb2 = dst->nb[2];
const int nb3 = dst->nb[3];
const int ith = params->ith;
const int nth = params->nth;
It should be turned into macro(s) and while at it - we should fix all integer types. In the example above, the nb values are incorrectly being cast to int instead of size_t
Hi @ggerganov , I would like to work on this issue , as a way of bootstrapping and learning ggml. I have a PR where I have tried to address the issue. Please take a look at it and give me feedback and I will try to solve this in the best way possible.
@GoWind: I have been thinking about this one, too. Maybe @ggerganov means something along the lines of
struct test {
int ne[4];
};
#define GGML_LOCALS_1(type, prefix, pointer, array) \
const type prefix##0 = (pointer)->array[0];
#define GGML_LOCALS_2(type, prefix, pointer, array) \
GGML_LOCALS_1(type, prefix, pointer, array) \
const type prefix##1 = (pointer)->array[1];
#define GGML_LOCALS_3(type, prefix, pointer, array) \
GGML_LOCALS_2(type, prefix, pointer, array) \
const type prefix##2 = (pointer)->array[2];
#define GGML_LOCALS_4(type, prefix, pointer, array) \
GGML_LOCALS_3(type, prefix, pointer, array) \
const type prefix##3 = (pointer)->array[3];
int main(int argc) {
struct test src;
GGML_LOCALS_4(int, ne, &src, ne)
}
This should help to get rid of most duplication. Not sure what to do about
const int ith = params->ith;
const int nth = params->nth;
though. Open questions:
- how should we call these macros?
- could more macro magic be useful?
@goerch
Yes - I was thinking something along those lines exactly. The goal is to reduce code duplication and probability for errors, while the code still remains readable and easy to extend.
how should we call these macros?
Open to suggestions. GGML_LOCALS_X is nice
could more macro magic be useful?
I dunno - as long as it's not super complicated :) We can do the refactoring in a few passes if necessary, gradually deduplicating the patterns.
Regarding the thread stuff: it might be part of #291 Or we can leave it as it is for now. Not sure yet
@ggerganov : possible simplification:
struct test {
int ne[4];
};
#define GGML_LOCALS_1(prefix, pointer, array) \
const auto prefix##0 = (pointer)->array[0];
#define GGML_LOCALS_2(prefix, pointer, array) \
GGML_LOCALS_1(prefix, pointer, array) \
const auto prefix##1 = (pointer)->array[1];
#define GGML_LOCALS_3(prefix, pointer, array) \
GGML_LOCALS_2(prefix, pointer, array) \
const auto prefix##2 = (pointer)->array[2];
#define GGML_LOCALS_4(prefix, pointer, array) \
GGML_LOCALS_3(prefix, pointer, array) \
const auto prefix##3 = (pointer)->array[3];
int main(int argc) {
struct test src;
GGML_LOCALS_4(ne, &src, ne)
}
The non auto version I think is better.
It's better to see the type explicitly so we always know what it is and take it into account
Now that #309 is merged: does it make sense if I adapt other accelerators without being able to test them (maybe I get OpenCL working)? Does CI cover enough of them?
The CI does not cover it yet, but if you make the PR in the llama.cpp repo, people will quickly test it out.
I will now sync the current ggml into llama.cpp in the next hour