TranSirius

Results 3 issues of TranSirius

Hey, Thanks for your codes! It seems that you should use the singular form in third personal for some verbs. Such as 'it train*s* relatively slower'

Early Stop scheme could enhance the performance to a certain extent. Why it's not used in this codes?

The general question is, does mamba-ssm currently support sequence parallelism in the mixer? I noticed that Section 8.2 in the paper of Mamba2 proposes a potential way to split activation...