Tim Moon

Results 31 issues of Tim Moon

# What does this PR do ? Adds support for training GPT-3 with the [Apex implementation of the ZeRO optimizer](https://github.com/NVIDIA/apex/blob/master/apex/contrib/optimizers/distributed_fused_adam.py). **Collection**: NLP # Changelog - Add option for `distributed_fused_adam` optimizer...

The current issue is mitigated by https://github.com/LLNL/lbann/pull/2073. It now takes active effort to create unused bias weights in the convolution and fully-connected layers. This issue is a record in case...

bug

Our current distconv support for deconvolution is limited to 2x2 deconvolution with stride 2. Fortunately, we already have implementations for 3x3 deconvolution: just swap the forward and backward steps from...

enhancement
review requested
do_not_merge

The ONNX conversion scripts are very old and broken. We have had a user run into problems when trying to use them, so I think it would be safer to...

bug
review requested
refactor

I encounter an error when I change the mini-batch size in one of the Bamboo unit tests, e.g.: https://github.com/LLNL/lbann/blob/9a9e31cb33fd5460ad6da335ff647aad79088049/bamboo/unit_tests/test_unit_layer_identity.py#L45 If the mini-batch size is less than the number of processes,...

bug

In NCHW tensor notation, the last dimension is the contiguous dimension. For column-major matrix notation, the first dimension is the contiguous dimension. We haven't needed to think that much about...

question
refactor

Our CI tests work well when we run with a single build per system, but I suspect things will go poorly if we try testing multiple compilers or multiple build...

bug
CI

# What does this PR do ? Generalize distributed Adam support for GPT-3 to T5 and other Megatron-LM models. It also implements several performance optimizations. **Collection**: NLP # Changelog -...

I've gotten incorrect results using distributed Adam to train GPT-3 at FP16 because of a bug with gradient clipping and gradient scaling. In particular, there's an incorrect assumption that gradient...

# What does this PR do ? Add a one line overview of what this PR aims to accomplish. **Collection**: [Note which collection this PR will affect] # Changelog -...

core
stale