tfx-addons
tfx-addons copied to clipboard
A load test of the exported model
It will be helpful to have a component (dedicated of extension to the InfraValidator
TFX Pipeline Component) that performs a load test of the exported model.
The expected behavior is to load the exported model, create a TensorFlow Serving endpoint, send requests to the prediction endpoint, and measure the response time. The load test may be performed using the common open-source software like Locust / Vegeta for HTTP or ghz for gRPC protocol.
The motivation behind it is that the prediction time may vary on the model type and structure. With this component, we can initially check if the model will meet the prediction time's business requirements.
Larry and Hannes both confirmed that they have a requirement for low-latency serving that would be benefitted by this. Questions about pushing and how to structure the pipeline. No blessing on test Serving instance, but would be a required blessing for production Push. Difficulty with exactly reproducing the production environment. InfraValidator does something similar, might be using TF.Serving. Similar to Evaluator, we could perhaps compare two models to assess the performance difference with the current test infra.
See project proposal
Is this currently being worked on?