tensorrt Converted fp16 or int8 model require up to 10 minutes to startup.

Why it's take so long and take almost 30GB of GPU memory? Is it rebuilding model every time then I run it? Can you fix it?

Jun 20 '21 14:06 devalexqt

To avoid long startup time please call converter.build() before you save the model, see the example here. Otherwise only a placeholder for the TRT engine is saved in the graph, and the TRT engine is rebuilt every time you load the model.

Jun 21 '21 14:06 tfeher

I already have converter.build() in my code but this NOT helping to prevent rebuild every time at startup in INT8 or FP16 mode.

BATCH_SIZE=1
data_directory = "/dataset/hr_train"
calibration_files = [os.path.join(path, name) for path, _, files in os.walk(data_directory) for name in files]
print('There are %d calibration files. \n%s\n%s\n...'%(len(calibration_files), calibration_files[0], calibration_files[-1]))

def parse_file(filepath):
    image = tf.io.read_file(filepath)
    image = tf.image.decode_image(image, channels=3)
    image = tf.image.random_crop(image, size=(360, 640,3))
    image=tf.cast(image,tf.float32)/255
    # image=tf.expand_dims(image,axis=0)
    return image

num_calibration_batches = 10

dataset = tf.data.Dataset.from_tensor_slices(calibration_files)
dataset = dataset.map(map_func=parse_file, num_parallel_calls=20)
dataset = dataset.batch(batch_size=BATCH_SIZE)
dataset = dataset.repeat(None)
calibration_dataset = dataset.take(num_calibration_batches)

def my_calibration_input_fn():
    for x in calibration_dataset:
        yield (x, )

params = tf.experimental.tensorrt.ConversionParams(
    precision_mode='INT8',
    maximum_cached_engines=1,
    use_calibration=True,
    # max_workspace_size_bytes=40000000,
)
converter = tf.experimental.tensorrt.Converter(
    input_saved_model_dir=INPUT_SAVED_MODEL_DIR,
     conversion_params=params,
     )

converter.convert(calibration_input_fn=my_calibration_input_fn)

def my_input_fn():
    inp1 = tf.random.normal([1,360,640,3])
    yield [inp1]


converter.build(input_fn=my_input_fn)  # Generate corresponding TRT engines
converter.save(OUTPUT_SAVED_MODEL_DIR)

Jun 21 '21 15:06 devalexqt

Looks like it's a BUG in TRT engine.

Jun 21 '21 15:06 devalexqt

Thanks @devalexqt for the update.

TF-TRT would create a new engine every time it sees input shape which it cannot handle with the existing engine. For example, if you create an engine with batch_size (N=1), and infer it N=8, then a new engine will be created (large overhed), and stored in the engine cache. Further inference requests with N <= 8 should run using that engine without large overhead.

If this does not apply to you, that means we have a bug. We would need some information on your network (preferably a reproducer script) to investigate that.

How does the memory size compare to the original model size? How many engines are created? If you have a large number of engines, that might explain the memory consumption. If you increase the minimum_segment_size parameter, that would reduce the number of engines and the memory consumption.

Here is how to print the number of engines:

from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
import re

def get_func_from_saved_model(saved_model_dir):
    saved_model_loaded = tf.saved_model.load(
        saved_model_dir, tags=[tag_constants.SERVING])
    graph_func = saved_model_loaded.signatures[
        signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
    return graph_func, saved_model_loaded

loaded_func, _ = get_func_from_saved_model("/tmp/models/trt_model")

print('Engine name       num of nodes')
n_engines = 0
pattern = re.compile(r'(TRTEngineOp_\d+_\d+)')
for func in loaded_func.graph.as_graph_def().library.function:
    m = pattern.search(func.signature.name)
    if m:
        n_engines += 1
        print("{:20s} {:5d}".format(m.group(1), len(func.node_def)))
print('\nTotal number of TensorRT engines', n_engines)

Jun 25 '21 12:06 tfeher

let me check

Jun 30 '21 00:06 devalexqt

For testing I use nvcr.io/nvidia/tensorflow:21.05-tf2-py3 docker image and batch=1 and create only one engine for input: inp1 = tf.random.normal([1,360,640,3]) for testing.

Output of your script:

Engine name       num of nodes
TRTEngineOp_0_0        475

Total number of TensorRT engines 1

Original model size on disk: 2.5MB but converted model size is 13MB.

Jun 30 '21 12:06 devalexqt