Results 32 issues of neverix

Currently, the StatefulTrainer cannot use [GradientTransformationExtraArgs](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.GradientTransformationExtraArgs). It would be fairly easy to add support with an extra keyword argument.

### Proposal Support the SmolLM2 series of models - https://huggingface.co/HuggingFaceTB/SmolLM2-135M etc ### Motivation SmolLM2 are models with the Llama architecture that are trained on better data than older models of...