Binxuan Huang

Results 4 comments of Binxuan Huang

Hi @fanshiqing , if we use the lagecy checkpointing method instead of the distributed checkpointing will we encounter this issue?

> We've managed to train mamba by modifying the Huggingface Trainer class. Here is our [implementation](https://github.com/havenhq/mamba-chat/tree/main), we were actually able to train a chat model that seems to perform quite...

I am using pytorch's FSDP with bf16 for training. Looks like I encountered similar issue with NaN loss.

Could we set ```logprobs``` to a large number for vLLM and openai completion API so that we can do the multile choice task using one-token generation?