Tim Moon issues

Results 31 issues of


                                            Tim Moon

Add support for Apex distributed Adam optimizer with GPT-3

# What does this PR do ? Adds support for training GPT-3 with the [Apex implementation of the ZeRO optimizer](https://github.com/NVIDIA/apex/blob/master/apex/contrib/optimizers/distributed_fused_adam.py). **Collection**: NLP # Changelog - Add option for `distributed_fused_adam` optimizer...

Hang when convolution layers have unused bias weights

The current issue is mitigated by https://github.com/LLNL/lbann/pull/2073. It now takes active effort to create unused bias weights in the convolution and fully-connected layers. This issue is a record in case...

bug

Support 3x3 deconvolution with distconv

Our current distconv support for deconvolution is limited to 2x2 deconvolution with stride 2. Fortunately, we already have implementations for 3x3 deconvolution: just swap the forward and backward steps from...

enhancement

review requested

do_not_merge

Remove deprecated Python scripts for ONNX conversion and plotting

The ONNX conversion scripts are very old and broken. We have had a user run into problems when trying to use them, so I think it would be safer to...

bug

review requested

refactor

Error when mini-batch size is smaller than number of processes

I encounter an error when I change the mini-batch size in one of the Bamboo unit tests, e.g.: https://github.com/LLNL/lbann/blob/9a9e31cb33fd5460ad6da335ff647aad79088049/bamboo/unit_tests/test_unit_layer_identity.py#L45 If the mini-batch size is less than the number of processes,...

bug

Notation for tensor and matrix dimensions are inconsistent

In NCHW tensor notation, the last dimension is the contiguous dimension. For column-major matrix notation, the first dimension is the contiguous dimension. We haven't needed to think that much about...

question

refactor

stale

Tim Moon

Add support for Apex distributed Adam optimizer with GPT-3

Hang when convolution layers have unused bias weights

Support 3x3 deconvolution with distconv

Remove deprecated Python scripts for ONNX conversion and plotting

Error when mini-batch size is smaller than number of processes

Notation for tensor and matrix dimensions are inconsistent

Bamboo tests with Python frontend can't switch between builds

Support distributed Adam with T5 and support overlapped grad reductions with pipeline parallelism

Fix bug with grad clipping and distributed Adam

Include param ids in mutex timeout warning