Nikhil Pavan Kanaka

Results 7 comments of Nikhil Pavan Kanaka

> Thanks for the response. I agree that it is more appropriate to post on the tranformers repo. I'll make sure to do that next time. I've tried it with...

``` /users/PAS2581/kanaka/miniconda3/envs/grokk/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight, /users/PAS2581/kanaka/miniconda3/envs/grokk/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. def backward(ctx,...

I'll refer to the config, Thanks. ``` config = AutoConfig.from_pretrained('state-spaces/mamba-130m') model = MambaForCausalLM(config) ``` Works fine. Mamba2 doesn't though, when ran with, ``` config = AutoConfig.from_pretrained('state-spaces/mamba2-130m') model = Mamba2ForCausalLM(config) ```...

I understand that the configurations are not directly compatible with HuggingFace, and that using the conversion script can help obtain the correct default values for the configuration. I’m having difficulty...

Sure, Thanks. Please give me an update when you did so.

Thanks for providing the Mamba2 130M configuration and weights. 1. I’m training the 130M model from scratch on an A100, with the same setup and parameters as GPT2. I’ve noticed...

I had a couple of questions regarding the model's default settings in the Huggingface configuration: From the paper: ![image](https://github.com/user-attachments/assets/e33b3d3c-438f-4240-a910-f2390179fcd9) Is the `no_bias_terms` parameter set to `True` by default in the...