keras-cv Example custom c++ op (CPU)

Oct 07 '22 01:10 ianstenbit

/gcbrun

Oct 07 '22 02:10 ianstenbit

/gcbrun

Oct 07 '22 05:10 ianstenbit

/gcbrun

Oct 07 '22 05:10 ianstenbit

/gcbrun

Oct 07 '22 05:10 ianstenbit

This is now working with CI across all of our platforms, and I've also manually tested this on MacOS.

Some todos are:

Move custom op away from IoU3D and into some _internal object (I think it's worth getting this committed even before we have a real op so that I can set up the internal build configuration necessary). Probably a ZeroOutTestLayer is appropriate.
Update README to show how to build from source
Update contributing guide to explain that building from source will be required to run custom op tests.

Oct 07 '22 05:10 ianstenbit

@tanzhenyu

Oct 07 '22 06:10 ianstenbit

Are we sure that we want to go in a direction of limited HW compatibility for keras libraries considering also all the overhead of maintaining custom ops?

I supposed that the scope of keras-* was to have a better interaction with the compiler team and stress test the technology but to still express our needs through the TF/*HLO composability.

But I must also admit that recently, also on the "modern" and (M)HLO native JAX, something like custom ops is emerging again with https://github.com/google/jax/pull/12632 (XLA's CustomCall)

/Cc @andrew-leaver @jpienaar @paynecl

Oct 07 '22 10:10 bhack

From https://github.com/google/jax/pull/12632/files#diff-b8191ca86b51eea58dc118d756de46521d77321338163145aea6b1ef64d126c5R77

JAX uses XLA to compile staged-out Python programs, which are represented with MHLO. MHLO offers common operations used in machine learning, such as the dot-product, exponential function, and so on. Unfortunately, sometimes these operations (and compositions of these operations) are not sufficiently expressive or performant and users look to external libraries (Triton, Numba) or hand-written code (CUDA, C/C++) to implement operations for their desired program.

So are we already here with Keras-* as we was at the same point for many years with TFAddons with the Custom ops proliferation (often with 1K lines of code x PR for components that required custom ops with CPU or CUDA only support)? /cc @seanpmorgan

Oct 07 '22 13:10 bhack

Hey @bhack -- thanks for the comments

This is for an existing internal use case which will require this custom op (the TF3D losses are insufficient) as well as a few other custom ops for 3D spatial data.

I also worry about package accessibility and I want to make sure that we package this in a way which does not add user friction for pip install and lets users who don't want the custom ops install without binaries. I'm not certain what the right solution to that is just yet.

Please let me know if you have any suggestions.

Oct 07 '22 15:10 ianstenbit

I also worry about package accessibility and I want to make sure that we package this in a way which does not add user friction for pip install and lets users who don't want the custom ops install without binaries. I'm not certain what the right solution to that is just yet.

I could understand that there was no time to address this subject and now, also on your side, you need just to "deliver" but I have tried to discuss this topic since June 2021 and earlier.

What will we do, once you have introduced the c++ infrastructure in the project, when contributors will start to push for their own custom kernels for performance, uncovered vectorized maps and the underline API design of native TF ops or for XLA bridges uncovered ops in TF https://github.com/openxla/stablehlo/issues/216#issuecomment-1263615220)?

Are you going to maintain a proliferation of user kernels excluding probably some HW vendor like e.g. AMD or Google CLOUD TPU? And what about exporting to TFlite when you are going to do this soon or later on an inference component instead of a loss?

So you could also solve something urgent, for your internal delivery with this, but after 1 year and half I still not understand what the strategy is.

Oct 07 '22 15:10 bhack

@bhack I've updated the README and contributing guide to reflect a posture that we do not intend to invite user-contributed custom ops at this time, and that we only intend to include training-time CPU ops.

Oct 07 '22 17:10 ianstenbit

/gcbrun

Oct 07 '22 17:10 ianstenbit

@bhack I've updated the README and contributing guide to reflect a posture that we do not intend to invite user-contributed custom ops at this time, and that we only intend to include training-time CPU ops.

Ok, but I still hope that when we cannot express an Op the compiler team could clarify why we cannot express this operation: In bootstrap case why we cannot effeciently express IOU3D in (?)HLO.

Oct 07 '22 17:10 bhack

@ianstenbit @martin-gorner I've opened a new thread at https://github.com/openxla/xla/discussions/17

Oct 08 '22 12:10 bhack

/gcbrun

Oct 18 '22 16:10 ianstenbit

/gcbrun

Oct 18 '22 21:10 ianstenbit

/gcbrun

Oct 18 '22 22:10 ianstenbit

Also I'm leaning towards only supporting Linux for now until we know better

Oct 20 '22 14:10 tanzhenyu

Also I'm leaning towards only supporting Linux for now until we know better

sgtm. I will leave the config code in for now for all platforms, but when we go to build our wheels we can build linux only for now (and other platforms can have a Python-only wheel)

Oct 21 '22 14:10 ianstenbit

/gcbrun

Oct 21 '22 15:10 ianstenbit

/gcbrun

Oct 21 '22 16:10 ianstenbit

/gcbrun

Oct 21 '22 17:10 ianstenbit

keras-cv keras-cv copied to clipboard

Example custom c++ op (CPU)

keras-cv
keras-cv copied to clipboard