We can replace target layers with similar but "cheaper" implementations in terms of FLOPs.
Examples are:
- Transformers layer with ALiBi
- Resnet implementation with Resnet-RS
- Linear layers with sparse Linear layers when preceded by an aggressive dropout