llama.cpp
llama.cpp copied to clipboard
struct ggml_tensor -> add a struct meta pointer into it (can replace padding)
Given that the tensor struct uses padding it's not nice to add any more information into it. It currently has a static 8 byte padding at the end, that's perfect to be replaced by a pointer to a struct to store additional information. For example a optional human readable tensor name (to be used in graph print) or a couple u_int8 to switch on or off features by tensor. For example: use_cublas=0 This would allow to fine-control the usage of such a library instead of hard-coding it on through a define flag. For example: performance_type=HIGH/LOW/MID For example: threads_override=2
use_cublas could be initialized depending on the define as 1/0.
I'd also move the task scheduling n_task out and the performance stuff The overhead to access a compact external struct should be zero.
I suppose all that stuff could also be added directly into the tensor struct but if we have to keep it aligned that's not nice. Especially given 64/32 bit environments with different pointer sizes etc.
Just as an example, giving tensors a name can look like that in the debug output:
- 842: [ 4096, 1, 1] MUL_MAT ( 1) cpu = 0.000 / 0.000 ms, wall = 0.595 / 0.595 ms [wo x cur 22]
- 843: [ 4096, 1, 1] ADD ( 1) cpu = 0.000 / 0.000 ms, wall = 0.008 / 0.008 ms [noname]
- 844: [ 4096, 1, 1] RMS_NORM ( 1) cpu = 0.000 / 0.000 ms, wall = 0.015 / 0.015 ms [noname]
- 845: [ 4096, 1, 1] MUL ( 1) cpu = 0.000 / 0.000 ms, wall = 0.004 / 0.004 ms [noname]
- 846: [ 11008, 1, 1] MUL_MAT ( 1) cpu = 2.000 / 2.000 ms, wall = 1.584 / 1.584 ms [w1 x cur 22] (Hot)
- 847: [ 11008, 1, 1] SILU ( 1) cpu = 0.000 / 0.000 ms, wall = 0.219 / 0.219 ms [noname]
- 848: [ 11008, 1, 1] MUL_MAT ( 1) cpu = 2.000 / 2.000 ms, wall = 1.643 / 1.643 ms [w3 x cur 22] (Hot)
- 849: [ 11008, 1, 1] MUL ( 1) cpu = 0.000 / 0.000 ms, wall = 0.009 / 0.009 ms [noname]
- 850: [ 4096, 1, 1] MUL_MAT ( 1) cpu = 3.000 / 3.000 ms, wall = 3.137 / 3.137 ms [w2 x cur 22] (Hot)
Without names it's unreadable, the image version looks ok but in text form it's just too many operations to keep track.
Yes, good idea. I think it's better to just extend ggml_tensor
instead of another level of indirection.
But anyway, this extension will happen soon as I also see various use cases
Is the padding still relevant ? Can I just extend it without causing problems ?
Maybe a couple structs (non pointers) in that case, separating the tensorstruct into a few logical groups like performance, debug, scheduler would make it a bit more accessible while still keeping all together in one parent.
I am a bit confused on padding in general because in 32 bit pointer size is 4 bytes, so I am not sure if the static 8 byte padding is right for all hardware.
Currently I got that one in use. The name is set by an optional function call:
typedef struct
{
struct ggml_tensor *tensor; // points back to it's tensor
char *name; // a optional name
uint8_t flag_high_performance; // if this tensor is high performance (relevant for hybrid core systems like Intel 12/13 gen)
uint8_t flag_use_blas_cuda; // defines if CUDA is enabled for this tensor (destination tensor)
} tensor_meta;
Is the padding still relevant ? Can I just extend it without causing problems ?
Padding is just to make sure these asserts hold:
https://github.com/ggerganov/llama.cpp/blob/5f939498d517b4dddbe904f202e895a3ecfb9dc4/ggml.c#L3403-L3404
Maybe a couple structs (non pointers) in that case
I think we want extra members without wrapping them in extra structures. Just streamline it as much as possible.
Also, no char *
pointers. Names should have a predetermined max length (e.g. 16 or 32) so that memory size per tensor is fixed
The static_assert(sizeof(
... seems a bit dubious, it should probably be static_assert(_Alignof(
... (which currently fails). If there was a clean and portable way to specify struct alignment, we could apply that to these two structs and remove the explicit padding bytes.
Decided to add void * extra;
after all