yury-tokpanov comments

Results 13 comments of


                                            yury-tokpanov

Command to delete files

Is it still under review? I don't see it listed in the documentation: https://www.backblaze.com/docs/cloud-storage-command-line-tools#usage But it's there if I do `b2 --help`

Command to delete files

Thanks! You should probably put a link to that doc on your website as well.

[Model] Support Mamba2 (Codestral Mamba)

@tlrmchlsmth thank you very much for your work! Kindly asking, do you have any updates since your last post?

[Model] Support Mamba2 (Codestral Mamba)

@tlrmchlsmth Thanks for the update! I work at Zyphra, and we are interested in incorporating our Zamba2 model into vLLM (#9382). I'm using your PR as a starting point, since...

[Model] Support Mamba2 (Codestral Mamba)

@tlrmchlsmth @fabianlim thanks for all your work! I have our internal implementation of Zamba2 based of previous version of this PR. I'm going to rebase it. Would you recommend using...

[Model] Support Mamba2 (Codestral Mamba)

I am unable to reproduce eval results for our Zamba2 model with lm_eval both for some loglikelihood tasks (winogrande, arc tasks) and generation tasks (like gsm8k), while some loglikelihood tasks...

[Model] Support Mamba2 (Codestral Mamba)

The computation of gated RMS norm depends on the number of Mamba2 groups: https://github.com/state-spaces/mamba/blob/0cce0fa645f100f00620ddf2333c2b7712abfdec/mamba_ssm/ops/triton/layernorm_gated.py#L32 . Our 7B model has 2 groups, so it definitely affects it. I'm still chasing other...

[Model] Support Mamba2 (Codestral Mamba)

After fixing gated rms norm, I was able to match gsm8k results for our 7B model. I still see some tasks numbers being lower for some reason, so going to...

[Model] Support Mamba2 (Codestral Mamba)

> @yury-tokpanov could you share what you did to fix gated rms norm? I don't see n_groups being handled in zamba here https://github.com/huggingface/transformers/blob/main/src/transformers/models/zamba2/modeling_zamba2.py#L64-L79 We have a new PR in transformers...

[Model] Support Mamba2 (Codestral Mamba)

UPDATE: no more the issue when using upstream vllm. ~~I rebased using the latest version of this PR, and now I'm getting this error from `torch.ops._vllm_fa2_C.varlen_fwd()` in `vllm/vllm_flash_attn/flash_attn_interface.py:173` even though...