Stas Bekman issues

Results 128 issues of


                                            Stas Bekman

[REQUEST] individual tensor `.grad` getter/setter for flattened tensors

As we know DS flattens individual tensors by optim groups so that each tensor and its attributes can't be accessed by the user once the ds engine takes over. As...

enhancement

Automatic pruning of previously downloaded files superceded by updated files

**Is your feature request related to a problem? Please describe.** When a repo is updated the user ends up with dead files that are no longer used in the cache...

enhancement

wontfix

how to tell conda-build not to copy the whole working directory

### Actual Behavior When running `conda-build` it first copies the whole working directory (!) ### Expected Behavior I'm not sure why you need to copy the whole thing, but if...

[proposal] drop assets ID

As disscused here https://github.com/huggingface/blog/pull/507#discussion_r974616495 and multiple times before the asset prefix ID (https://github.com/huggingface/blog/tree/main/assets) is not being used anywhere and it very often interferes with blog post PRs when multiple new...

Invalid test naming in --collect-only -q report when the test is not part of a class

When `pytest-pspec` is installed `pytest --collect-only -q` fails to deal with tests that have no classes, seems to be the case with unittest or without Here is a simple test...

[pure bf16 training] w/ `AnyPrecisionAdamW`

getting pure bf16 training (not mixed) running with `AnyPrecisionAdamW` also in bf16 I think it should require x8 bytes per param, instead of x18 for mixed precision training - i.e....

[BUG] `zero.Init` fails when some class is brought in dynamically

Once the issue of https://github.com/microsoft/DeepSpeed/issues/2811 is resolved we immediately get a new problem with dynamic class importation. This time the situation can be first demonstrated with the real code of:...

bug

training

[BUG] nested `zero.Init` in real models leads to an infinite recursion

**Describe the bug** ## Repro Nested `zero.Init` leads to an infinite recursion: ``` import torch import deepspeed ds_config = dict(train_batch_size=1, zero_optimization=dict(stage=3)) class MyModel(torch.nn.Module): def __init__(self, m1): super().__init__() self.m1 = m1...

bug

training

[cleanup] remove `pass` calls where they aren't needed

This PR removes `pass` calls where they aren't needed

[REQUEST] BF16 mixed precision => grad accum in fp32

**Is your feature request related to a problem? Please describe.** We have proven with the BLOOM training that BF16 is by far more superior for mixed precision training than FP16,...

enhancement

training