Less Wright

Results 81 comments of Less Wright

Hi @hiyyg - will definitely add it after I get a chance to test it out on some benchmarks. Thanks for the pointer!

Hi @huangnengCSU - fully agree with your point! We're working on a Ranger22 version and as part of that will hope to better document things than we had time for...

Hi @neuronflow @ioanvl - thanks for issue. The warning is resolved now in the file ranger2020.py It also adds in latest gc2 from the gc developer so I wanted to...

Hi @jetjodh, I agree but unfortunately I don't work in Keras/TF anymore and don't have the expertise to do it. I'll leave the issue open and hopefully someone with the...

from review discussion: "actually I think having this concept of 'smart defaults' where it attempts appropriate BF16 but rolls back to FP32 when not supported is a nice user experience....

Some additional comments on MoE: 1 - A recent Meta paper highlights that you can train the experts independently (as usual dense models) and then merge the FFN layers to...

@lucasjinreal - when you hit your OOM using zero3, is it during the forward pass or backward pass? (sounds like forward pass but want to confirm). If it's during the...

PR ready for review: https://github.com/pytorch-labs/torchtrain/pull/14

short term is the stride check can be removed to explore tracing (this check is rarely needed, confirmed on llama_7b). Longer term this will either need a refactor to support...