batch-inference-benchmarks Consider use the new predict_batch

Consider use the new predict_batch_udf in Spark 3.4

Open chenliu0831 opened this issue 1 year ago • 0 comments

See https://spark.apache.org/docs/3.4.0/api/python/reference/api/pyspark.ml.functions.predict_batch_udf.html and https://developer.nvidia.com/blog/distributed-deep-learning-made-easy-with-spark-3-4/. The current implementation seems to be subject to issues below.

The predict_batch_udf introduces standardized code for:

Translating Spark DataFrames into NumPy arrays, so the end-user DL inferencing code does not need to convert from a Pandas DataFrame.

Batching the incoming NumPy arrays for the DL frameworks.

Model loading on the executors, which avoids any model serialization issues, while leveraging the Spark spark.python.worker.reuse configuration to cache models in the Spark executors.

Aug 10 '23 15:08 chenliu0831

batch-inference-benchmarks batch-inference-benchmarks copied to clipboard

Consider use the new predict_batch_udf in Spark 3.4

batch-inference-benchmarks
batch-inference-benchmarks copied to clipboard