Zheng Cai comments

Results 60 comments of


                                            Zheng Cai

Add support for left padding and masking in forward() and generate()

> > > > I tried using this branch but got an error about not getting expected number of gradients during backward (15 vs 16) > > > > >...

Add support for left padding and masking in forward() and generate()

> > > I tried using this branch but got an error about not getting expected number of gradients during backward (15 vs 16) > > > > > >...

Add support for left padding and masking in forward() and generate()

> > > > > I tried using this branch but got an error about not getting expected number of gradients during backward (15 vs 16) > > > >...

Add support for left padding and masking in forward() and generate()

We are trying this PR because we want mamba to process **packed sequence** like what has been done in transformer-based models. If we directly pad the sequence with zero, then...

Add support for left padding and masking in forward() and generate()

> > > > > > I tried using this branch but got an error about not getting expected number of gradients during backward (15 vs 16) > > >...

Add support for left padding and masking in forward() and generate()

> > > > > > > > I tried using this branch but got an error about not getting expected number of gradients during backward (15 vs 16) >...

Issue with raylet error

I am curious because I met the same problem, it seems that the disk space of ray spilling continues to grow until out of disk error accurs.

Question about support for sequence parallel

> In general, yes. Which flavor of sequence parallelism are you referring to? The one in Megatron-LM? Thanks for your timely response! Sure. I am referring to the one in...

Question about support for sequence parallel

Got it. Thanks!

Question about does mamba support variable-length input or cu_seqlens like flash attention?

Got it. Thank you Tri Dao!