Stas Bekman

Results 128 issues of Stas Bekman

As we know DS flattens individual tensors by optim groups so that each tensor and its attributes can't be accessed by the user once the ds engine takes over. As...

enhancement

**Is your feature request related to a problem? Please describe.** When a repo is updated the user ends up with dead files that are no longer used in the cache...

enhancement
wontfix

### Actual Behavior When running `conda-build` it first copies the whole working directory (!) ### Expected Behavior I'm not sure why you need to copy the whole thing, but if...

As disscused here https://github.com/huggingface/blog/pull/507#discussion_r974616495 and multiple times before the asset prefix ID (https://github.com/huggingface/blog/tree/main/assets) is not being used anywhere and it very often interferes with blog post PRs when multiple new...

When `pytest-pspec` is installed `pytest --collect-only -q` fails to deal with tests that have no classes, seems to be the case with unittest or without Here is a simple test...

getting pure bf16 training (not mixed) running with `AnyPrecisionAdamW` also in bf16 I think it should require x8 bytes per param, instead of x18 for mixed precision training - i.e....

Once the issue of https://github.com/microsoft/DeepSpeed/issues/2811 is resolved we immediately get a new problem with dynamic class importation. This time the situation can be first demonstrated with the real code of:...

bug
training

**Describe the bug** ## Repro Nested `zero.Init` leads to an infinite recursion: ``` import torch import deepspeed ds_config = dict(train_batch_size=1, zero_optimization=dict(stage=3)) class MyModel(torch.nn.Module): def __init__(self, m1): super().__init__() self.m1 = m1...

bug
training

This PR removes `pass` calls where they aren't needed

**Is your feature request related to a problem? Please describe.** We have proven with the BLOOM training that BF16 is by far more superior for mixed precision training than FP16,...

enhancement
training