clipper
clipper copied to clipboard
Ask about the performance issue of pyspark model serving
One concern for the pyspark model serving is the real time performance or latency.
Clipper provides a wrapper of pyspark session, as mentioned in the document:
The model container creates a long-lived SparkSession when it is first initialized and uses that to load this model once at initialization time. The long-lived SparkSession and loaded model are provided by the container as arguments to the prediction function each time the model container receives a new prediction request.
However, it seems that the long-lived SparkSession is heavy because of the communication between spark master and worker, even on the single machine. Do you have results about the latency of the spark model serving?
Thanks!
@simon-mo @withsmilo Any thought?
MLFlow uses https://github.com/combust/mleap for low-latency SparkML serving. It seems to be nice for us.
@withsmilo I will take a look at it!