Mitchell Wortsman comments

Results 88 comments of


                                            Mitchell Wortsman

Souping on regression model leading to a drastic drop in accuracy

Hmm. I really don't know. I guess souping + regression may be an open problem. Sorry about that.

RuntimeError: Triton Error [CUDA]: invalid argument

Yep looks like a similar error to what I'm seeing.. still haven't been able to resolve mine if anybody has any advice would be much appreciated (https://github.com/openai/triton/issues/1512)

RuntimeError: Triton Error [CUDA]: invalid argument

Thanks yea you're probably right. I'm on torch2.0.0+cu118 with triton2.0.0. I'll try torch1.13.1+cu117 and see if that works.

RuntimeError: Triton Error [CUDA]: invalid argument

thanks, really appreciate it! i'll mess around with versions (probably later this week) and see if that fixes things

Figure out why AdamW + gradient accumulation leads to different results for test case

A useful test here could also just be a short training run with and without grad accum such that we'd expect the curves to be identical. If the model with...

Documentation: competing frameworks

Agree thanks for raising. This is in progress but to provide some updates: - Added the following sentence to the readme: "In contrast with other repositories such as Megatron, we...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

This is quite weird, thanks a lot for documenting that this is an issue. Just curious, does the behavior go away with `--grad-checkpointing`?

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

> @mitchellnw Let me check! To clarify what I'm looking for, I'd expect 8x / 12x / 14x batch sizes to fit for 11m vs 160m? Yes totally. Sorry about...