Stas Bekman

Results 664 comments of Stas Bekman

Hi @dennisbakhuis In a bit we will move this to https://github.com/microsoft/DeepSpeed/issues as this is not the integration problem. As I discovered this recently when trying to build a multi-modal model...

Meanwhile the workaround I did is this: as one of the models was much smaller than the other I initialized the smaller one w/o `zero.Init` and the other normally w/...

The minimal repro is just this: ``` from transformers import DonutProcessor, VisionEncoderDecoderModel import torch import deepspeed from transformers.deepspeed import HfDeepSpeedConfig ds_config = dict(train_batch_size=1, zero_optimization=dict(stage=3)) dschf = HfDeepSpeedConfig(ds_config) # keep this...

The cause proved to be 2 `from_config` calls each invoking `zero.Init` context internally. https://github.com/huggingface/transformers/blob/97d3390fc8edb210fcf0aad6a079406b018655b9/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py#L191-L195

BTW, do you have enough cpu memory to load this model? In this case a temp hack would be very simple, just disable the `zero.Init` contexts directly: ``` diff --git...

OK, I reduced the problem to this repro: ``` import torch import deepspeed ds_config = dict(train_batch_size=1, zero_optimization=dict(stage=3)) class MyModel(torch.nn.Module): def __init__(self, m1): super().__init__() self.m1 = m1 with deepspeed.zero.Init(config_dict_or_path=ds_config): with deepspeed.zero.Init(config_dict_or_path=ds_config):...

OK, I filed the report here: https://github.com/microsoft/DeepSpeed/issues/2811

> deepspeed.zero.Init should only be called once at the moment, yes > What is unclear to me is who to "blame" (in a positive sense (-;). ... If you read...

Thank you for doing the experiment, Dennis. Glad to hear it worked. The Deepspeed team are actively working on resolving these 2 issues: https://github.com/microsoft/DeepSpeed/issues/2811, https://github.com/microsoft/DeepSpeed/issues/2812 so hopefully we should have...