mycroft-precise
mycroft-precise copied to clipboard
Huge performance impact of chunk size
I noticed a huge speed improvement when chunk_size is reduced. I used a 50 layer GRU and a hop size of 0.01 For example, processing time is cut in half when using chunk_size=4096 instead of default 2048.
From what I understand, a probability is calculated only at the end of the chunk and therefore by doubling the chunk_size the dense layers are used twice as less. However, the GRU values are still calculated for each window and there is no processing reduction there? That would mean that the bulk of processing is done on the dense layers?
Anyway, I if this is true than it might be good to put it in the readme as a way to improve speed, especially on RPi. This would enable RPi to run deeper models by sacrificing resolution (but not accuracy). WIth the sample depth of 2, sampling frequency of 16000 Hz and chunk size of 4086 there are still ~8 predictions per second.
Just wanted to confirm, everything you said is correct. A lot of optimization has been done to speed up the audio processing so the GRU takes up the majority of the processing time. Sure, I think it'd be good to include it somewhere. Since most people reading the readme will just be using the default network size, what do you think about adding it in the training tutorial next to a new section that describes how to add more weights to the network?
I am still not sure why does it takes half the time to do one inference than it does to do two inferences on the same amount of samples. The same amount of GRU steps need to be calculated in both instances. Isn't the inference only done by reading the GRU state at that step? As the GRU states and outputs still need to be calculated for each sample it should be almost free to read them as an output.
I might try to copy the weights into a new model with the return_sequence argument set to True. Then, it should be possible to get the outputs for each sample in the chunk, regardless of the chunk size. If this still results in the observed speed increase, it will make a huge difference. And the inference frequency would be the same for any chunk_size.
I think adding it to the training tutorial is a good idea. I can do that as soon as I find some free time.