Ajay Saini
Ajay Saini
There are a few problems with StochasticDepth determinism right now: - `use_same_gpu_seed` assumes each process has exactly the same seed when instead each process has `seed = user provided seed...
## 🚀 Feature Request An implementation that made only one call to `F.batch_norm` or `F.layer_norm` would be more performant than the one we have now. Before implementing the change, some...
## 🚀 Feature Request Right now, `surgery.replace_module_classes` does not preserve any initialized model weights for modules that are replaced. However, in cases where there is a 1-1 mapping of weights...
Let's use `max_new_tokens` and mark `max_length` as deprecated - it's much clearer
Opening this for review but do not merge