sparseml
sparseml copied to clipboard
TopKast additional tests + bugfix
Additional tests to ensure Top-KAST is working as intended.
Bugfix: when computing weight decay for the backwards-only weights (set B in the paper), the multiplier should be proportional to 1/(the number of dense weights), not (1/the sparsity).