Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

resume_from_checkpoint is not working with Deepspeed

@Raibows, thank you for providing an easy to use repro - you can use `model_name = 'patrickvonplaten/t5-tiny-random'` while debugging this as it'd be much faster and not require many resources....

resume_from_checkpoint is not working with Deepspeed

Glad you figured it out, @Raibows! That's why we have unit tests that help us know whether the feature is working correctly and when it doesn't for a user often...

Time spent on engine.step() increased strangely

- The first few steps lead to an OVERFLOW so optimizer didn't run and thus was fast. it then adjusted the scaling factor each step until it reached one that...

Inconsistent training steps between Trainer and DeepSpeed

Thank you for the great and easy to reproduce report, @fenchri Indeed, you found a grad accumulation bug in HF Trainer. This is not an bug in DeepSpeed or its...

Inconsistent training steps between Trainer and DeepSpeed

Hmm, actually looking at earlier steps, this appears to be odd as well: ``` {'loss': 10.8252, 'learning_rate': 3e-05, 'epoch': 0.86} 40%|████████████████████████████████████████▊ | 4/10 [00:00

Inconsistent training steps between Trainer and DeepSpeed

ok, actually I came up with a fix, will push shortly for you to try Please try https://github.com/huggingface/transformers/pull/22098

Inconsistent training steps between Trainer and DeepSpeed

> Thanks @stas00 for having a look and apologies for the late reply. Indeed, the fix resolves the issue! tada excellent! Thank you for testing the PR, @fenchri > I...

[performance] from_pretrained is still much slower than torch.load and seems to be initializing weights

Thank you for trying to analyse this, @moyix and for wanting to make things faster. I dug into it and here is what I have to share with you. #...

[performance] from_pretrained is still much slower than torch.load and seems to be initializing weights

I'm curious, are you doing inference or finetuning? Because for the latter usually the init overhead is usually irrelevant. Fast loading is also important for debug. I think I'm going...

[performance] from_pretrained is still much slower than torch.load and seems to be initializing weights

Some additional solutions coming from pytorch-slack where I asked [this question](https://pytorch.slack.com/archives/C3PDTEV8E/p1677813090248699): 1. install pytorch-nightly from instructions at https://pytorch.org/get-started/locally/ (or if you read this later when pytorch==2.0 is released any 2.0...