ggml icon indicating copy to clipboard operation
ggml copied to clipboard

Add custom mapping functions

Open LoganDark opened this issue 2 years ago • 3 comments

The current mapping functions are basically jokes, add some real ones. These ones get access to the actual tensor structs so they can do things like

  • Know the dimensions they are operating on
  • Work with tensors with more than 2 dimensions, or transposed
  • Operate on two differently sized tensors (like matmul)
  • Use their own thread pool that does a better job than ggml does.

Among other things ...

LoganDark avatar Jun 16 '23 04:06 LoganDark

nearest merge conflict nearly gave us a panic attack

LoganDark avatar Jun 19 '23 20:06 LoganDark

@ggerganov what do we have to do to get this PR merged? Is the CI failure important?

-Emily

LoganDark avatar Jun 22 '23 01:06 LoganDark

Nothing on your side, I'll merge this later today

ggerganov avatar Jun 22 '23 04:06 ggerganov

Nothing on your side, I'll merge this later today

So how's today going so far? lol

LoganDark avatar Jun 24 '23 16:06 LoganDark

  • support multi-thread operators
  • pass ggml_compute_params to the callback

Indeed these are things that I considered, but my impression was that ggml_compute_params was an unstable internal implementation detail, and the header file API should not necessarily expose that. I optimized this for "low probability of having to change in the future" rather than raw expressivity (other than being able to access the raw tensors, of course).

LoganDark avatar Jun 24 '23 19:06 LoganDark

Initially, I didn't plan to have it as part of the public API, but at some point it was exposed in order to support the CUDA implementation. So now that we have it public anyway, we can make use of it in the custom operators.

I guess we can now also obsolete the old "unary" and "binary" mappings as they are subset of the new custom ops.

ggerganov avatar Jun 24 '23 19:06 ggerganov

they are subset of the new custom ops

Not really, those are for pure functions that can be parallelized by row. Useful to keep around while the user can't necessarily do this themself.

Basically, those are the "managed" equivalent of these new "unmanaged" operations

LoganDark avatar Jun 24 '23 21:06 LoganDark