ggml Parallelize unary tanh on cpu, generalize ADD to allow more shapes

Parallelize unary tanh on cpu, generalize ADD to allow more shapes

Open audiovention opened this issue 1 year ago • 0 comments

I'm working on a project which needed those operations. tanh was parallelized in the same manner as other unary ops

ADD is generalized to allow for ggml_can_repeat constraint, instead of the ggml_can_repeat_rows This was done adding two extra branches in the function, one of them is likely very slow and handles the most general case. The second is particularly optimized for my project's need (adding MxN and 1xP tensors) and uses ggml_vec_add1_f32.

Oct 13 '23 09:10 audiovention

ggml ggml copied to clipboard

Parallelize unary tanh on cpu, generalize ADD to allow more shapes

ggml
ggml copied to clipboard