edgetpu performance improvement question on multiple models/concurrent execution

performance improvement question on multiple models/concurrent execution

Open jk78346 opened this issue 4 years ago • 2 comments

Hi Team, I notice that there are several ways to improve performance such as model pipelining on multiple edge TPUs, model co-compiling technique to avoid alter-eviction on cache. If two models' sizes are small enough to cache on device at the same time, does edgeTPU support concurrent execution on one single edgeTPU?

Another way to think about this is I can design a single model itself that it takes multiple inputs and gives multiple outputs. However, it seems not gaining any benefit and requires 'split' and 'concate' operations. For this case, how costly are those two operations are? And does this design actually yield benefit? (My testing is: multiple input/output model is 2x more slower than single one, which means it is slower than I just execute the later one twice in serial)

Dec 10 '20 06:12 jk78346

some details on my test: I tried the following model:

input_list = []
conv_list  = []
for i in range(fold):
    input0 = keras.layers.Input(shape=(IN_W, IN_H, IN_C))
    input_list.append(input0)
    conv = Conv2D(filters=OUT_C, kernel_size=(F_W, F_H), strides=(S_W, S_H), activation='linear', weights=[weights], use_bias=False, trainable=False)(input0)
    conv_list.append(conv)
out    = concatenate(conv_list, axis=1)
model = tf.keras.models.Model(inputs=input_list, outputs=[out])
return model

and the edgetpu_compiler (version: 14.1.317412892) gives the error message:

F tensorflow/lite/toco/graph_transformations/quantize.cc:606] Check failed: is_rnn_state_array

Which I don't see why.

And the 'split' and 'concat' version I mentioned is the following:

input0 = keras.layers.Input(shape=(IN_W*fold, IN_H, IN_C))
split  = Lambda(lambda x: tf.split(x, num_or_size_splits=fold, axis=1))(input0)
conv_list = []
for i in range(fold):
    conv0 = Conv2D(filters=OUT_C, kernel_size=(F_W, F_H), strides=(S_W, S_H), activation='linear', weights=[weights], use_bias=False, trainable=False)(split[i])
    conv_list.append(conv0)
out    = concatenate(conv_list, axis=1)
model = tf.keras.models.Model(inputs=[input0], outputs=out)
return model

Dec 10 '20 06:12 jk78346

@jk78346

Have you tried optimize the model with edgetpu_compiler 15.0. let us know if the path is still being explored.

May 09 '21 23:05 Naveen-Dodda

edgetpu edgetpu copied to clipboard

performance improvement question on multiple models/concurrent execution

edgetpu
edgetpu copied to clipboard