Lake Lee
Lake Lee
I add install `torch-model-archiver` part as well. I think the link to the homepage at [pypi](https://pypi.org/project/torch-model-archiver/ ) needs to be updated. I didn't know where to install this until I...
Hi @vasqu, @amyeroberts, Thank you for the feedback and guidance. I noticed that the force push I made may have been a bit premature, especially considering the advice to wait...
@amyeroberts I've rebased my branch now that PR #32694 is merged and force-pushed the updates. Everything should be up-to-date. Let me know if anything else is needed!
Hello, @amyeroberts I've added the `[run-slow] mamba2` commit as requested, and I see that all CI tests have passed. Please let me know if there's anything else I should address....
Hello @amyeroberts, I've added the `[run-slow] mamba2` commit to re-trigger the tests, but it seems the CI tests are failing due to permission issues. Checking the logs, the tests are...
Hello, @amyeroberts. I’ve just pushed a commit that rebases my branch with the latest changes from the main branch. I expect that these updates should properly resolve the issue with...
Hi, just to confirm: If I apply this commit (https://github.com/wdykas/mamba/commit/bfec072693f050505b3b28f25bf532c2c9623ded), will it resolve the illegal memory access issue for long sequence lengths? I have struggled with a similar problem before....
I trained models using this repository's implementation **prior to this PR** and consistently observed a ~0.2% difference in grad norm across runs starting from the same checkpoint and data. This...
Could you clarify the `sequence length` used during training? To my knowledge, Mamba and Transformer models demonstrate comparable speeds when the `sequence length` reaches 2048 or somewhat longer. However, for...
The transition from Mamba1 to Mamba2 does not show significant improvements for short sequence lengths. As seen in the attached image, Mamba2 still performs slower than Transformers for shorter sequences....