catwalk
catwalk copied to clipboard
Generalized ia3
What's Here
Moves a more generalized IA3 adaptor implementation to Tango (PR pending) and provides an example script for how to use it in Catwalk.
Results on piqa
While hardly impressive results, the IA3 implementation manages to reduce validation loss and recover much of the accuracy of the fully tuned equivalent for all the architectures for which default configurations are provided. The gpt-j-6b
full tune is not able to run on a single gpu while the IA3 training is able to fit due to having far fewer optimizer states for its fewer trainable parameters.
![Screen Shot 2022-09-13 at 6 57 54 PM](https://user-images.githubusercontent.com/40903802/190041418-4c7deeb5-f0d0-45b3-967e-992f32d50a9f.png)