multilingual-modeling
multilingual-modeling copied to clipboard
Composable SFT
https://arxiv.org/pdf/2110.07560.pdf <-- Paper https://github.com/cambridgeltl/composable-sft <-- code
TODOs:
- Determine hyperparameters we should use for comparable testing. This will mean, likely, x train steps + 50k rewinded steps with one iteration in their method. Or maybe 5 iterations + 10k train steps twice? Idk yet
- Add loading an SFT from path (NOT MAIN PRIORITY)
If we want to train both adapters and Composable SFT at once, this will require some extra code. Probably not TOO bad, but would need extra testing to account for freezing all correct parameters