NAG-BERT
NAG-BERT copied to clipboard
Question regarding the speed up
I saw the paper use argmax as the equation to obtain the sequence. I understand that that would be a Viterbi algorithm, where the complexity is again O(n). I'm confused that how is it faster than Auto-Regressive approach
I saw the paper use argmax as the equation to obtain the sequence. I understand that that would be a Viterbi algorithm, where the complexity is again O(n). I'm confused that how is it faster than Auto-Regressive approach
i think the reason is that model only run once, then Viterbi decode. Auto-Regressive should run n
I saw the paper use argmax as the equation to obtain the sequence. I understand that that would be a Viterbi algorithm, where the complexity is again O(n). I'm confused that how is it faster than Auto-Regressive approach
Hello, thank you for your question. The speed up comes from the fact that NAG-BERT only do the forward computation once, as for autoregressive models they have to do forward pass n times where n is the length of output sequence.