tfx
tfx copied to clipboard
How to pass custom component to DataflowRunner?
I am using TFX on Kubeflow. I have written a custom component that does some work. I want that library (defining the custom_component) to be available on the Dataflow runner. Is there an example of how to do it?
Right now, it is complaining that my "custom_component" does not exist.
@sadeel can you use the TFX CLI [1] to package your custom component? See for example the part about building the image with skaffold
mentioned in [2]. See also the custom component docs in [3]
[1] https://github.com/tensorflow/tfx/blob/master/docs/guide/cli.md [2] https://github.com/tensorflow/tfx/blob/master/docs/tutorials/tfx/template_beam.ipynb [3] https://github.com/tensorflow/tfx/blob/master/tfx/examples/custom_components/slack/README.md#compile-the-pipeline-gcp
I've done that and that works well when running just in KFP. However, inside KFP, I need to start a Dataflow job, and the Dataflow job needs to be aware of my custom component - I haven't figured a good way to do that.
Got it, this is a missing feature right now. We'll take a look at fixing this.
Not sure if this issue is still relevant, but the TFX docs contain updated informaltion on how to provide multi-dependencies to Dataflow. Two options:
- Package your code via a tar ball or
setup.py
(it needs to contain the TFX code too, see note below) - Build a custom image to be used by Dataflow's workers
If you provide your own packages, it will overwrite TFX's package for Dataflow. (see https://github.com/tensorflow/tfx/blob/master/tfx/utils/dependency_utils.py#L63)
I couldn't get the multi-dependencies to work with a tar ball or setup.py
(we have internal dependencies which aren't publicly available via PyPI), but the docker image worked perfectly.
Further references:
- https://www.tensorflow.org/tfx/guide/beam
- https://cloud.google.com/dataflow/docs/guides/using-custom-containers#docker
- https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#multiple-file-dependencies
- https://github.com/tensorflow/tfx/issues/3994#issuecomment-873554029
Big thanks to @wizjo for deep diving into this issue!
@sadeel As mentioned above, the above steps work perfectly. Just make sure you are using the docker image. Please go ahead and close the issue as it has been resolved. Thanks!
Closing this issue as it has been stale for 2 weeks. Please update response, and we will reopen it again. Thanks!!