Less Wright comments

Results 80 comments of


                                            Less Wright

[Feature] Add gradient accumulation

@awgu - is there a context manager or similar option in fsdp2 that would support gradient accumulation and thus enable this in titan? I know we talked about this for...

pretraining.sh launch script is hardcoded to dev folders/env...can you provide a generic launch script for regular servers?

Thanks very much for the fast fix!

add pipeline parallelism to train llama

I will start a sep branch for pippy integration and begin testing with this.

SLS and parameter groups for larger datasets?

so FastAI creates 2 param groups to split out l1 and l2 params....I've made a temp function to avoid that: `def filter_all_params_no_split(layer_groups:Collection[nn.Module])->List[List[nn.Parameter]]: pure = [] buffer=[] for l in layer_groups:...

SLS and parameter groups for larger datasets?

Hi @IssamLaradji That's great to hear! I'm hoping to get it setup so your SLS is fully able to be integrated with FastAI2 and thus be readily available as an...

SLS and parameter groups for larger datasets?

Excellent - testing it now!

SLS and parameter groups for larger datasets?

It's handling the param groups in the respect it doesnt' blow up like before. However, it's not actually learning anything (loss ends up same as random..i.e. 10 classes = accuracy...

SLS and parameter groups for larger datasets?

![sls_not_learning](https://user-images.githubusercontent.com/46302957/71338121-8f728c80-2503-11ea-81a9-40fbda0dcb87.jpg) Layer Groups Len 1 Len Split_params = 2 Opt results 1 Sls ( Parameter Group 0 beta_b: 0.9 beta_f: 2.0 bound_step_size: True c: 0.1 eta_max: 10 gamma: 2.0 init_step_size:...

SLS and parameter groups for larger datasets?

I'll pickup on it again tomorrow and try to isolate it more. I can't tell exactly where it's not working at this point, but it's at least running now in...

SLS and parameter groups for larger datasets?

Hi @IssamLaradji - here's a relevant snippet but not sure how much that will help you. I had to make changes to three different FastAI files to get SLS to...