BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

feat: Support distributed batch inferencing job on Apache Spark cluster

Open parano opened this issue 4 years ago • 8 comments

parano avatar Jul 13 '20 05:07 parano

Is this related to https://github.com/bentoml/BentoML/issues/666 and https://github.com/bentoml/BentoML/pull/957 ?

Talador12 avatar Aug 20 '20 16:08 Talador12

@Talador12 sorry I haven't got a chance to fill in the issue description yet. This one is very different from #666 and #957. This ticket is about applying ML model packaged with BentoML to large data set on a Spark cluster. It should work for models trained with any of the ML frameworks that BentoML supports(e.g. Tensorflow, Scikit-learn etc). While #666 is about support serving Spark MLlib model in BentoML.

Note that users can already do this with BentoML & spark today. Although we want to provide a set of tools on top of the existing BentoML input adapters API to make working with Spark's data types more easily.

parano avatar Aug 20 '20 17:08 parano

I would like to pick up this work and here is the design doc https://docs.google.com/document/d/1C7_BT1kIF8Z2YJXioPUSg5J0yfDEfN5g5zYtcZW2Nx8/edit?usp=sharing

xuzikun2003 avatar Dec 20 '20 01:12 xuzikun2003

Have been discussing this with @xuzikun2003 @bojiang and here's an update:

We are investigating making BentoService class/instance pickle serializable by hooking the pickle interface to BentoML's own save and load implementation. This should allow users to create Spark UDFs with BentoML packaged ML models more easily.

Note that this will be a separate effort from the design doc shared above, which shows BentoML's own batch inference API. BentoML's batch inference jobs API is a high-level API for launching and managing batch inference jobs. Whereas the Spark UDF integration gives the user more flexibility when working with Spark application.

parano avatar Jan 19 '21 23:01 parano

This reads as an appropriate measure - Spark UDFs were made for this kind of custom code/integration. Thank you for taking the initiative on this! I will continue to follow and provide user feedback when I can :)

Talador12 avatar Jan 20 '21 16:01 Talador12

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 02 '21 17:06 stale[bot]

is this on the roadmap for BentoML 1.0?

alexdivet avatar Mar 08 '22 11:03 alexdivet

@Talador12 @alexdivet We are focusing on streaming and batching now after building a solid foundation with 1.0. Would love to hear any feedback.

yubozhao avatar Aug 04 '22 03:08 yubozhao