serving icon indicating copy to clipboard operation
serving copied to clipboard

Enable AMP (Automatic Mixed Precision ) in Tensorflow Serving.

Open whatdhack opened this issue 4 years ago • 11 comments

Describe the problem the feature is intended to solve

AMP accelerates inference significantly.

Describe the solution

A flag for enabling AMP

Describe alternatives you've considered

There is no alternative with Tensorflow Serving

Additional context

N/A

whatdhack avatar Mar 25 '20 16:03 whatdhack

This I think should be very high priority ( at the least FP16) , otherwise the case for TFS becomes weak.

whatdhack avatar Apr 03 '20 16:04 whatdhack

The AMP is mainly target on training instead of serving.(https://www.tensorflow.org/guide/keras/mixed_precision)

Have you observed the significant performance difference for serving as well? If so, could you share the benchmark and related numbers?

shadowdragon89 avatar Apr 10 '20 16:04 shadowdragon89

How do I turn on AMP in serving ? I have observed 50% improvement in processiong time with fp16 over fp32 without any noticeable change in accuracy. Reduced precision is one of the corner stones of Nvidia TensorRT, etc. See this one also - https://medium.com/@whatdhack/neural-network-inference-optimization-8651b95e44ee .

whatdhack avatar Apr 12 '20 21:04 whatdhack

Is there a way to do the following in TFS ?

config = tf.ConfigProto()
config.graph_options.rewrite_options.auto_mixed_precision = 1
sess = tf.Session(config=config)

whatdhack avatar Apr 13 '20 06:04 whatdhack

I just ran some tests on a MaskRCNN Saved Model in nvcr.io/nvidia/tensorflow:20.03-tf1-py3. TF_ENABLE_AUTO_MIXED_PRECISION seems to work very well for inference - requires less memory and speeds up significantly. The following are the numbers , if you need more convincing.

TF_ENABLE_AUTO_MIXED_PRECISION =1, memory = 4.2GB , inference speed 0.25 sec .

vs

memory = 7.1 GB , inference speed 0.53 sec .

whatdhack avatar May 14 '20 16:05 whatdhack

Thanks for the experiments and numbers! Based on the number, we could add the option. I will also follow up with our GPU team.

shadowdragon89 avatar May 14 '20 18:05 shadowdragon89

Any update here? Also, is it possible to enable JIT/XLA as well like https://github.com/tensorflow/serving/issues/1515 ?

jeisinge avatar Nov 02 '20 16:11 jeisinge

Any update here?

lre avatar Feb 22 '22 14:02 lre

I'd really appreciate this feature being added too

DerryFitz avatar Apr 12 '22 14:04 DerryFitz

Hi, Any updates here?

junA2Z avatar May 17 '23 13:05 junA2Z

Hi, Any updates here?

BobLiu20 avatar Sep 11 '23 06:09 BobLiu20