Stas Bekman comments

Results 676 comments of


                                            Stas Bekman

[BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"

Very interesting. I have never seen such behavior before. I wasn't part of the Deepspeed integration at accelerate so you probably need to ask there. The HF Trainer integration works...

[BUG] leaner CPU memory allocations with cpu offload

> > 2. the other question is why when the model was allocated via `zero.Init` w/o offload it consumes 10GB of CPU memory and not close to 1GB? I bracketed...

[BUG] leaner CPU memory allocations with cpu offload

> This is due to unnecessary zero stage 3 memory allocation that is exposed because of the shared code base. To address this, I have embarked on the **trivial** task...

parallelize writing of layer checkpoint files across data parallel instances

Hi Adam, Indeed, we have finished training 176B, so hopefully this version will accept your work. In the case of JeanZay from my many experiments IO seems to be the...

Add explicit gradient_accumulation_dtype config

There is one more dimension to this design discussion - and it's whether the additional accumulator is sharded or not. e.g. currently bf16 optimizer allocates a local accumulator on each...

Add explicit gradient_accumulation_dtype config

BF16Optimizer is ZeRO stage 1, but currently it's a bit of a hack and thus uses stage=0, it's just differently implemented so can't be used as normal stage-1 - this...

Error building extension 'cpu_adam'

> We already have examples for running for some transformer networks. For this argument, I think you might just add **local_rank** to your parser arguments the same as [here](https://github.com/microsoft/DeepSpeedExamples/blob/20ea07a2a069696abec212e25476a9bf76aced70/bing_bert/utils.py#L51-L54). This...

Error building extension 'cpu_adam'

And the error is right there in your report: https://github.com/microsoft/DeepSpeed/issues/889#issuecomment-806526657 > c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-10.2/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -isystem /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC...

Error building extension 'cpu_adam'

This sounds like a permission issue. Try to set `TMPDIR` to another dir that is writable by you? e.g.: ``` mkdir ~/tmp export TMPDIR=~/tmp ... do the build here ......

Reshape ZeroStage=0 FP16 Checkpoint

Yes, once bf16/z0 PR is merged we can look at fp16/z0 next. The other approach is to: 1. start with a random optim states 2. run for some steps with...