ggml Add custom mapping functions

The current mapping functions are basically jokes, add some real ones. These ones get access to the actual tensor structs so they can do things like

Know the dimensions they are operating on
Work with tensors with more than 2 dimensions, or transposed
Operate on two differently sized tensors (like matmul)
Use their own thread pool that does a better job than ggml does.

Among other things ...

Jun 16 '23 04:06 LoganDark

nearest merge conflict nearly gave us a panic attack

Jun 19 '23 20:06 LoganDark

@ggerganov what do we have to do to get this PR merged? Is the CI failure important?

-Emily

Jun 22 '23 01:06 LoganDark

Nothing on your side, I'll merge this later today

Jun 22 '23 04:06 ggerganov

Nothing on your side, I'll merge this later today

So how's today going so far? lol

Jun 24 '23 16:06 LoganDark

support multi-thread operators

pass ggml_compute_params to the callback

Indeed these are things that I considered, but my impression was that ggml_compute_params was an unstable internal implementation detail, and the header file API should not necessarily expose that. I optimized this for "low probability of having to change in the future" rather than raw expressivity (other than being able to access the raw tensors, of course).

Jun 24 '23 19:06 LoganDark

Initially, I didn't plan to have it as part of the public API, but at some point it was exposed in order to support the CUDA implementation. So now that we have it public anyway, we can make use of it in the custom operators.

I guess we can now also obsolete the old "unary" and "binary" mappings as they are subset of the new custom ops.

Jun 24 '23 19:06 ggerganov

they are subset of the new custom ops

Not really, those are for pure functions that can be parallelized by row. Useful to keep around while the user can't necessarily do this themself.

Basically, those are the "managed" equivalent of these new "unmanaged" operations

Jun 24 '23 21:06 LoganDark