w-plus-adapter icon indicating copy to clipboard operation
w-plus-adapter copied to clipboard

RCA's residual setup

Open gaoyixuan111 opened this issue 10 months ago • 1 comments

The statement at the end of the WPlusAttnProcessor Class defines the residual connection. Are you defining the initial hidden_states, which is the input from the previous step, as residual, and the hidden_states after W+QKV calculation as the actual functional residual? "The order of key statements is as follows." residual = hidden_states hidden_states = hidden_states + self.scale * wplus_hidden_states hidden_states = hidden_states + residual

Your work is excellent, thank you sincerely for your response.

gaoyixuan111 avatar Apr 23 '24 14:04 gaoyixuan111