TranSirius
Results
3
issues of
TranSirius
Hey, Thanks for your codes! It seems that you should use the singular form in third personal for some verbs. Such as 'it train*s* relatively slower'
Early Stop scheme could enhance the performance to a certain extent. Why it's not used in this codes?
The general question is, does mamba-ssm currently support sequence parallelism in the mixer? I noticed that Section 8.2 in the paper of Mamba2 proposes a potential way to split activation...