ggml
ggml copied to clipboard
Parallelize unary tanh on cpu, generalize ADD to allow more shapes
I'm working on a project which needed those operations. tanh was parallelized in the same manner as other unary ops
ADD is generalized to allow for ggml_can_repeat constraint, instead of the ggml_can_repeat_rows This was done adding two extra branches in the function, one of them is likely very slow and handles the most general case. The second is particularly optimized for my project's need (adding MxN and 1xP tensors) and uses ggml_vec_add1_f32.