Less Wright

Results 80 comments of Less Wright

@awgu - is there a context manager or similar option in fsdp2 that would support gradient accumulation and thus enable this in titan? I know we talked about this for...

I will start a sep branch for pippy integration and begin testing with this.

so FastAI creates 2 param groups to split out l1 and l2 params....I've made a temp function to avoid that: `def filter_all_params_no_split(layer_groups:Collection[nn.Module])->List[List[nn.Parameter]]: pure = [] buffer=[] for l in layer_groups:...

Hi @IssamLaradji That's great to hear! I'm hoping to get it setup so your SLS is fully able to be integrated with FastAI2 and thus be readily available as an...

It's handling the param groups in the respect it doesnt' blow up like before. However, it's not actually learning anything (loss ends up same as random..i.e. 10 classes = accuracy...

![sls_not_learning](https://user-images.githubusercontent.com/46302957/71338121-8f728c80-2503-11ea-81a9-40fbda0dcb87.jpg) Layer Groups Len 1 Len Split_params = 2 Opt results 1 Sls ( Parameter Group 0 beta_b: 0.9 beta_f: 2.0 bound_step_size: True c: 0.1 eta_max: 10 gamma: 2.0 init_step_size:...

I'll pickup on it again tomorrow and try to isolate it more. I can't tell exactly where it's not working at this point, but it's at least running now in...

Hi @IssamLaradji - here's a relevant snippet but not sure how much that will help you. I had to make changes to three different FastAI files to get SLS to...