Stas Bekman
Stas Bekman
As we know DS flattens individual tensors by optim groups so that each tensor and its attributes can't be accessed by the user once the ds engine takes over. As...
**Is your feature request related to a problem? Please describe.** When a repo is updated the user ends up with dead files that are no longer used in the cache...
### Actual Behavior When running `conda-build` it first copies the whole working directory (!) ### Expected Behavior I'm not sure why you need to copy the whole thing, but if...
As disscused here https://github.com/huggingface/blog/pull/507#discussion_r974616495 and multiple times before the asset prefix ID (https://github.com/huggingface/blog/tree/main/assets) is not being used anywhere and it very often interferes with blog post PRs when multiple new...
When `pytest-pspec` is installed `pytest --collect-only -q` fails to deal with tests that have no classes, seems to be the case with unittest or without Here is a simple test...
getting pure bf16 training (not mixed) running with `AnyPrecisionAdamW` also in bf16 I think it should require x8 bytes per param, instead of x18 for mixed precision training - i.e....
Once the issue of https://github.com/microsoft/DeepSpeed/issues/2811 is resolved we immediately get a new problem with dynamic class importation. This time the situation can be first demonstrated with the real code of:...
**Describe the bug** ## Repro Nested `zero.Init` leads to an infinite recursion: ``` import torch import deepspeed ds_config = dict(train_batch_size=1, zero_optimization=dict(stage=3)) class MyModel(torch.nn.Module): def __init__(self, m1): super().__init__() self.m1 = m1...
This PR removes `pass` calls where they aren't needed
**Is your feature request related to a problem? Please describe.** We have proven with the BLOOM training that BF16 is by far more superior for mixed precision training than FP16,...