Nikhil Pavan Kanaka comments

Results 7 comments of


                                            Nikhil Pavan Kanaka

Can't train mamba2 from scratch with HF Trainer

> Thanks for the response. I agree that it is more appropriate to post on the tranformers repo. I'll make sure to do that next time. I've tried it with...

Can't train mamba2 from scratch with HF Trainer

``` /users/PAS2581/kanaka/miniconda3/envs/grokk/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight, /users/PAS2581/kanaka/miniconda3/envs/grokk/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. def backward(ctx,...

Can't train mamba2 from scratch with HF Trainer

I'll refer to the config, Thanks. ``` config = AutoConfig.from_pretrained('state-spaces/mamba-130m') model = MambaForCausalLM(config) ``` Works fine. Mamba2 doesn't though, when ran with, ``` config = AutoConfig.from_pretrained('state-spaces/mamba2-130m') model = Mamba2ForCausalLM(config) ```...

Can't train mamba2 from scratch with HF Trainer

I understand that the configurations are not directly compatible with HuggingFace, and that using the conversion script can help obtain the correct default values for the configuration. I’m having difficulty...

Can't train mamba2 from scratch with HF Trainer

Sure, Thanks. Please give me an update when you did so.

Can't train mamba2 from scratch with HF Trainer

Thanks for providing the Mamba2 130M configuration and weights. 1. I’m training the 130M model from scratch on an A100, with the same setup and parameters as GPT2. I’ve noticed...

Can't train mamba2 from scratch with HF Trainer

I had a couple of questions regarding the model's default settings in the Huggingface configuration: From the paper: ![image](https://github.com/user-attachments/assets/e33b3d3c-438f-4240-a910-f2390179fcd9) Is the `no_bias_terms` parameter set to `True` by default in the...