mxnet
mxnet copied to clipboard
[FEATURE] Add query_keys transformer version without split
Description
MXNet is fusing split, reshape, swapaxis and batch_dot operators for performance purpose. In gpt-2 model this fuse could be done as well if we exclude split.
->

Checklist
Essentials
- [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
- [x] Changes are complete (i.e. I finished coding on this PR)
- [x] All changes have test coverage
- [x] Code is well-documented
Hey @agrabows , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:
- To trigger all jobs: @mxnet-bot run ci [all]
- To trigger specific jobs: @mxnet-bot run ci [job1, job2]
CI supported jobs: [unix-cpu, unix-gpu, website, windows-cpu, centos-gpu, edge, sanity, centos-cpu, windows-gpu, miscellaneous, clang]
Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.

@mxnet-bot run ci [unix-gpu]
Jenkins CI successfully triggered : [unix-gpu]