edgetpu
edgetpu copied to clipboard
performance improvement question on multiple models/concurrent execution
Hi Team, I notice that there are several ways to improve performance such as model pipelining on multiple edge TPUs, model co-compiling technique to avoid alter-eviction on cache. If two models' sizes are small enough to cache on device at the same time, does edgeTPU support concurrent execution on one single edgeTPU?
Another way to think about this is I can design a single model itself that it takes multiple inputs and gives multiple outputs. However, it seems not gaining any benefit and requires 'split' and 'concate' operations. For this case, how costly are those two operations are? And does this design actually yield benefit? (My testing is: multiple input/output model is 2x more slower than single one, which means it is slower than I just execute the later one twice in serial)
some details on my test: I tried the following model:
input_list = []
conv_list = []
for i in range(fold):
input0 = keras.layers.Input(shape=(IN_W, IN_H, IN_C))
input_list.append(input0)
conv = Conv2D(filters=OUT_C, kernel_size=(F_W, F_H), strides=(S_W, S_H), activation='linear', weights=[weights], use_bias=False, trainable=False)(input0)
conv_list.append(conv)
out = concatenate(conv_list, axis=1)
model = tf.keras.models.Model(inputs=input_list, outputs=[out])
return model
and the edgetpu_compiler (version: 14.1.317412892) gives the error message:
F tensorflow/lite/toco/graph_transformations/quantize.cc:606] Check failed: is_rnn_state_array
Which I don't see why.
And the 'split' and 'concat' version I mentioned is the following:
input0 = keras.layers.Input(shape=(IN_W*fold, IN_H, IN_C))
split = Lambda(lambda x: tf.split(x, num_or_size_splits=fold, axis=1))(input0)
conv_list = []
for i in range(fold):
conv0 = Conv2D(filters=OUT_C, kernel_size=(F_W, F_H), strides=(S_W, S_H), activation='linear', weights=[weights], use_bias=False, trainable=False)(split[i])
conv_list.append(conv0)
out = concatenate(conv_list, axis=1)
model = tf.keras.models.Model(inputs=[input0], outputs=out)
return model
@jk78346
Have you tried optimize the model with edgetpu_compiler 15.0. let us know if the path is still being explored.