Mihir Patel
Mihir Patel
Oops closed wrong PR, meant to do channels last one
Thanks for bringing this up! Feedback is super helpful. >(Bug) Nothing is printed indicated that composer is restarting the forward method when grad_accum="auto" is set to `True` We just merged...
Noting here that we merged in cache clearing and are seeing far lower rates of cache fragmentation. It looks like this has basically resolved the auto grad accum issues, but...
Closing this because it seems resolved -- please feel free to open if you feel any point was not addressed
Looks like this was closed [here](https://github.com/mosaicml/composer/pull/1340)
Would love to have this merged in since I'm also affected by this. @kdaily since you are assigned to the issue, would you mind reviewing this and merging if it...
Can we instead return how many layers are changed? Would be very useful for agent bc then it can skip surgeries if they are no-ops
> Agreed with @nik-mosaic , having the methods return an `int` would lead to odd bugs when users try: > > ``` > model = apply_x(model) > ``` > >...
@abhi-mosaic can you please verify dataloader changes to the gpt yamls
Unit tests verify it parses correctly. I'm skipping tests checking that it actually runs since cluster is full, and these example YAMLs are all for a deprecated codepath for which...