InternEvo
InternEvo copied to clipboard
[Feature] support sequence parallel in head layer and embedding layer
Describe the feature
they should not in separated parameter group
Will you implement it?
- [X] I would like to implement this feature and create a PR!