context_recommendation About multi-head

About multi-head

Open EagleYing opened this issue 4 years ago • 0 comments

The article describes that the k hidden units in the middle of the multi-head AutoEncoder can obtain local context information, but why each hidden unit uses all the input mask(x), which is equivalent to calculating the traditional DAE K times. Does this make any sense?

Dec 08 '20 13:12 EagleYing

context_recommendation context_recommendation copied to clipboard

About multi-head

context_recommendation
context_recommendation copied to clipboard