DeepXi
DeepXi copied to clipboard
Can the MHANet run in real time
Hi,
I am confused whether the MHANet works in real time. From my understrand, the masked attention only match causal scenario, may be not applicable to real tme.
Best Regards, looking forward to your reply.
I did not get the chance to develop the model to run on a real-time system.
It would need some more development, but I assume its possible. You could do things like reuse past keys and queries for the attention mechanism to speed up processing times and determine a window of time-steps for the model that will allow it to be run fast enough on a device such that it is real time. So a few compromises would need to be made I assume. Also, a device with a GPU would make things much easier.
Maybe a paper like this could give you some ideas: https://arxiv.org/abs/2010.11395
I could be wrong, but I am sure it is very possible with some modifications.
Aaron.
Yes, i also think its possible that model run on a real-time system.
a) For a masked attention matrix(full history, 0 lookahead), like 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 I think it's different in traner and inferencer.
b) For a masked attetnion matrix(N history, 0 lookahead), in which N is the window size, if N=3, we can get 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 But i dont sure whether its suitable for real-time systems. specially, is it possible that such training method(b) can be applied to reasoning of streaming audio.
Thanks.
Sounds like an interesting problem to investigate :) I am sure it could work with some constraints. Consider things like using previously computed keys to speed up processing, e.g., this is done with language models when generating text to speed up decoding: https://github.com/huggingface/transformers/blob/820c46a707ddd033975bc3b0549eea200e64c7da/src/transformers/models/gpt2/modeling_gpt2.py#L984
Thanks, i will learn relevant knowledge.