ggml
ggml copied to clipboard
Tanh is not implemented
Hi there,
Apparently tanh function is not implemented in the library. Is this by design ? Which function can I use to replace tanh function ?
Thank you for your help.
It's just it hasn't been needed yet.
You can either submit a PR implementing it, or you can use the existing ggml_map_unary_f32()
which allows you to write custom operators in your project:
https://github.com/ggerganov/ggml/blob/db5eef149d569604a98708f6059dce63c6b9af1d/include/ggml/ggml.h#L953-L958
Is the following implementation correct ?
void tanh_op(const int size, float *out, const float *in) {
for (int i = 0; i < size; i++) {
out[i] = tanh(in[i]);
}
}
inpL = ggml_map_unary_f32(ctx0, inpL, tanh_op);
Is the following implementation correct ?
void tanh_op(const int size, float *out, const float *in) { for (int i = 0; i < size; i++) { out[i] = tanh(in[i]); } } inpL = ggml_map_unary_f32(ctx0, inpL, tanh_op);
Does it give the results you expect?
Yes, it does. Not sure if it is an optimized code.
It looks perfectly fine to me. The only way you could make it faster is by processing multiple elements at once. (for example multithreading or SIMD)
Basically it seems optimal to me.
Thanks. I want it single threaded. Should I close this issue ?
If you don't need ggml to include a built-in operator for this, sure
Yes, it does. Not sure if it is an optimized code.
Most compilers will detect the simple for loop and might unroll it or use simd to make it faster for you in higher optimization levels. :) So, looks perfectly fine. The only way to go faster here might be to have a tanh() thats faster-but less accurate...
Yes, it does. Not sure if it is an optimized code.
Most compilers will detect the simple for loop and might unroll it or use simd to make it faster for you in higher optimization levels. :) So, looks perfectly fine. The only way to go faster here might be to have a tanh() thats faster-but less accurate...
This is what float16 is for :) small enough you can use a lookup table
From the looks of it, #316 fixes this?