mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Issue loading finetuned mamba model into MambaLMHeadModel

Open thomas-bartlett opened this issue 2 years ago • 3 comments

I am getting the following error trying to load a Mamba model:

TypeError: MambaConfig.__init__() got an unexpected keyword argument '_name_or_path'

This is due to the config.json having this as its first line:

"_name_or_path": "state-spaces/mamba-130m-hf"

I get the error running:

model = MambaLMHeadModel.from_pretrained(model_name, device='cuda')

When loading a model trained with Lora, I get:

TypeError: expected str, bytes or os.PathLike object, not NoneType

I am guessing this is because there is no config.json file. Am I missing something?

thomas-bartlett avatar Mar 18 '24 19:03 thomas-bartlett

I guess you are mixing state-spaces/mamba-130m with state-spaces/mamba-130m-hf. Use state-spaces/mamba-130m for MambaLMHeadModel

whaleloops avatar Apr 22 '24 13:04 whaleloops

I am encountering the same issue, maybe choice the wrong model 0.0

JaggerGu avatar May 10 '24 06:05 JaggerGu

I guess you are mixing state-spaces/mamba-130m with state-spaces/mamba-130m-hf. Use state-spaces/mamba-130m for MambaLMHeadModel

I choice the mamba-130m, but also have this problem

JaggerGu avatar May 10 '24 06:05 JaggerGu