llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

struct ggml_tensor -> add a struct meta pointer into it (can replace padding)

Open cmp-nct opened this issue 1 year ago • 4 comments

Given that the tensor struct uses padding it's not nice to add any more information into it. It currently has a static 8 byte padding at the end, that's perfect to be replaced by a pointer to a struct to store additional information. For example a optional human readable tensor name (to be used in graph print) or a couple u_int8 to switch on or off features by tensor. For example: use_cublas=0 This would allow to fine-control the usage of such a library instead of hard-coding it on through a define flag. For example: performance_type=HIGH/LOW/MID For example: threads_override=2

use_cublas could be initialized depending on the define as 1/0.

I'd also move the task scheduling n_task out and the performance stuff The overhead to access a compact external struct should be zero.

I suppose all that stuff could also be added directly into the tensor struct but if we have to keep it aligned that's not nice. Especially given 64/32 bit environments with different pointer sizes etc.

Just as an example, giving tensors a name can look like that in the debug output:

 - 842: [    4096,       1,       1]          MUL_MAT   (  1) cpu =   0.000 /   0.000 ms, wall =   0.595 /   0.595 ms [wo x cur 22]
 - 843: [    4096,       1,       1]              ADD   (  1) cpu =   0.000 /   0.000 ms, wall =   0.008 /   0.008 ms [noname]
 - 844: [    4096,       1,       1]         RMS_NORM   (  1) cpu =   0.000 /   0.000 ms, wall =   0.015 /   0.015 ms [noname]
 - 845: [    4096,       1,       1]              MUL   (  1) cpu =   0.000 /   0.000 ms, wall =   0.004 /   0.004 ms [noname]
 - 846: [   11008,       1,       1]          MUL_MAT   (  1) cpu =   2.000 /   2.000 ms, wall =   1.584 /   1.584 ms [w1 x cur 22] (Hot)
 - 847: [   11008,       1,       1]             SILU   (  1) cpu =   0.000 /   0.000 ms, wall =   0.219 /   0.219 ms [noname]
 - 848: [   11008,       1,       1]          MUL_MAT   (  1) cpu =   2.000 /   2.000 ms, wall =   1.643 /   1.643 ms [w3 x cur 22] (Hot)
 - 849: [   11008,       1,       1]              MUL   (  1) cpu =   0.000 /   0.000 ms, wall =   0.009 /   0.009 ms [noname]
 - 850: [    4096,       1,       1]          MUL_MAT   (  1) cpu =   3.000 /   3.000 ms, wall =   3.137 /   3.137 ms [w2 x cur 22] (Hot)

Without names it's unreadable, the image version looks ok but in text form it's just too many operations to keep track.

cmp-nct avatar Apr 20 '23 21:04 cmp-nct

Yes, good idea. I think it's better to just extend ggml_tensor instead of another level of indirection. But anyway, this extension will happen soon as I also see various use cases

ggerganov avatar Apr 21 '23 18:04 ggerganov

Is the padding still relevant ? Can I just extend it without causing problems ?

Maybe a couple structs (non pointers) in that case, separating the tensorstruct into a few logical groups like performance, debug, scheduler would make it a bit more accessible while still keeping all together in one parent.

I am a bit confused on padding in general because in 32 bit pointer size is 4 bytes, so I am not sure if the static 8 byte padding is right for all hardware.

Currently I got that one in use. The name is set by an optional function call:

typedef struct
{
    struct ggml_tensor *tensor; // points back to it's tensor
    char *name;                 // a optional name

    uint8_t flag_high_performance; // if this tensor is high performance (relevant for hybrid core systems like Intel 12/13 gen)
    uint8_t flag_use_blas_cuda;    // defines if CUDA is enabled for this tensor (destination tensor)
} tensor_meta;

cmp-nct avatar Apr 21 '23 20:04 cmp-nct

Is the padding still relevant ? Can I just extend it without causing problems ?

Padding is just to make sure these asserts hold:

https://github.com/ggerganov/llama.cpp/blob/5f939498d517b4dddbe904f202e895a3ecfb9dc4/ggml.c#L3403-L3404

Maybe a couple structs (non pointers) in that case

I think we want extra members without wrapping them in extra structures. Just streamline it as much as possible. Also, no char * pointers. Names should have a predetermined max length (e.g. 16 or 32) so that memory size per tensor is fixed

ggerganov avatar Apr 22 '23 10:04 ggerganov

The static_assert(sizeof(... seems a bit dubious, it should probably be static_assert(_Alignof(... (which currently fails). If there was a clean and portable way to specify struct alignment, we could apply that to these two structs and remove the explicit padding bytes.

sw avatar Apr 28 '23 13:04 sw

Decided to add void * extra; after all

ggerganov avatar Jul 28 '23 19:07 ggerganov