mamba Would you post a minimal example of training this?

Would you post a minimal example of training this?

Open freckletonj opened this issue 1 year ago • 7 comments

Amazing work, and I'm inspired by the connections to dynamical systems.

Would you mind showing us a minimal example of training or finetuning this?

Dec 05 '23 20:12 freckletonj

same problem

Dec 06 '23 03:12 RevolGMPHL

We released just the core model because it can be drop-in replaced for any model of any training/finetuning pipeline, of which there are many. Is there an example application you have in mind?

Dec 06 '23 03:12 albertfgu

Thanks for the reply, and woh, just any pytorch training setup will do? I'm just interested in next-token prediction.

Does it get along with, say, the accelerate ecosystem for multi-node/multi-gpu? I saw transformers in setup.py, how does that work? I thought this architecture wasn't related?

I assume optimizations like flash attention are no longer relevant?

When you release larger models (fingers-crossed!!!), bitsandbytes will likely become relevant, as well as peft and QLORAs, and DeepSpeed.

But then I'm also curious about some training params, like, LR?, AdamW?, WD?

Dec 06 '23 04:12 freckletonj

Agreed, even an example with the HuggingFace Trainer would be lovely. I am running into issues using it with HuggingFace trainer and even with causal language modeling with Transformers without Trainer. Thank you for the incredible work as well, this is amazing.

Dec 06 '23 05:12 DiscordJim

https://github.com/state-spaces/mamba/issues/6, i tried deepspeed zero 3 with HF trainer API, looks good.

I added,

cross entropy loss.
Transformers config interface.
Transformers PretrainedModel interface.

The results,

tested to save using safetensors.
load existing checkpoints to continue pretraining.
with 80GB VRAM, maximum batch size is 8 with 4k context length, 1 step took ~300ms.

Dec 06 '23 06:12 huseinzol05

Just saw your post, great work and tested on my end with similar success.

Dec 06 '23 06:12 DiscordJim

Geez open source is fast, here's a chattified version with simple example: https://github.com/havenhq/mamba-chat/blob/main/train_mamba.py

Dec 07 '23 04:12 freckletonj

mamba mamba copied to clipboard

Would you post a minimal example of training this?

mamba
mamba copied to clipboard