nntrainer icon indicating copy to clipboard operation
nntrainer copied to clipboard

restructure the multi-head attention layer

Open jijoongmoon opened this issue 1 year ago • 1 comments

We can optimize the memory consumption of the multi-head attention layer by combination of layers. By doing this, we could reduce the memory further.

  1. compute multi-head one by one.
  2. re-implement the multi-head attention layer as an backbone layer.

jijoongmoon avatar Sep 07 '22 22:09 jijoongmoon

:octocat: cibot: Thank you for posting issue #1998. The person in charge will reply soon.

taos-ci avatar Sep 07 '22 22:09 taos-ci

To list for 1

  • [ ] Enhance split layer to split input by given number(number of head). #2025
  • [ ] Replace a multi head attention layer by make a sub-graph
  • [ ] Compare the peak memory consumption and latency before and after changes
  • [ ] Compare the peak memory consumption and latency before and after enabling the swap feature

lhs8928 avatar Oct 25 '22 03:10 lhs8928