w-plus-adapter icon indicating copy to clipboard operation
w-plus-adapter copied to clipboard

RCA's residual setup

Open gaoyixuan111 opened this issue 1 year ago • 1 comments
trafficstars

The statement at the end of the WPlusAttnProcessor Class defines the residual connection. Are you defining the initial hidden_states, which is the input from the previous step, as residual, and the hidden_states after W+QKV calculation as the actual functional residual? "The order of key statements is as follows." residual = hidden_states hidden_states = hidden_states + self.scale * wplus_hidden_states hidden_states = hidden_states + residual

Your work is excellent, thank you sincerely for your response.

gaoyixuan111 avatar Apr 23 '24 14:04 gaoyixuan111

The statement at the end of the WPlusAttnProcessor Class defines the residual connection. Are you defining the initial hidden_states, which is the input from the previous step, as residual, and the hidden_states after W+QKV calculation as the actual functional residual? "The order of key statements is as follows." residual = hidden_states hidden_states = hidden_states + self.scale * wplus_hidden_states hidden_states = hidden_states + residual

Your work is excellent, thank you sincerely for your response.

In my understanding about your question, hidden_states = hidden_states + self.scale * wplus_hidden_states the latter item self.scale * wplus_hidden_states is the residual in our work.

csxmli2016 avatar Apr 23 '24 15:04 csxmli2016