bnb optimizers could use bnb.nn.StableEmbedding instead of torch.nn.Embedding
According to bnb documentation here:
https://huggingface.co/docs/bitsandbytes/main/optimizers https://huggingface.co/docs/bitsandbytes/main/explanations/optimizers#stable-embedding-layer
This line could alter between bnb.nn.StableEmbedding and torch.nn.Embedding, or allow it to be configurable in config file: https://github.com/Lightning-AI/litgpt/blob/a8aa4bae5043b81b0b5e54bed838d1b57e1e1fe7/litgpt/model.py#L28
There are also other places in code where torch.nn.Embedding is used.
Thanks for the note and good point, I didn't know about this.
One challenge I see with configuring it in the config file is that it's used to model creation. But one can later optionally run with --quantize bnb.nf4 or not. So, ideally, that swap should only take place upon calling the inference/training functions and leave the original model as is.
Upon reading a bit more, this would only be required for training (due to the optimizer choice). I added it in #1770