DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Same model Llama 7B, why does zero3 initialize different parameter sizes?

Open BaenRH opened this issue 2 years ago • 1 comments

When I use Zero3, in initializing the network, if my Llama is rewritten by inheritance as follows:

class FlashLlamaModel(LlamaModel):
   def __init__(self, config: LlamaConfig):
       super().__init__(config) 

class FlashLlamaForCausalLM(LlamaForCausalLM):
   def __init__(self, config):
       super().__init__(config)
       self.model = FlashLlamaModel(config) 

actor_model = create_hf_model(
           model_class= FlashLlamaForCausalLM,  
           model_name_or_path=actor_model_name_or_path,
           tokenizer=self.tokenizer,
           ds_config=ds_config, 
           disable_dropout=self.args.disable_actor_dropout,
           debug=debug)

Log output: initialize about 14B parameters. '[2023-09-27 01:48:00,149] [INFO] [partition_parameters.py:454:__exit__] finished initializing model with 13.63B parameters'

But when not overriding the parent self.model, the log output goes back to 7B.

class FlashLlamaModel(LlamaModel):
    def __init__(self, config: LlamaConfig):
        super().__init__(config) 

class FlashLlamaForCausalLM(LlamaForCausalLM):
    def __init__(self, config):
        super().__init__(config) 

actor_model = create_hf_model(
            model_class= FlashLlamaForCausalLM,  
            model_name_or_path=actor_model_name_or_path,
            tokenizer=self.tokenizer,
            ds_config=ds_config, 
            disable_dropout=self.args.disable_actor_dropout,
            debug=debug)

Log output: initialize about 7B parameters. [2023-09-27 01:52:23,475] [INFO] [partition_parameters.py:454:__exit__] finished initializing model with 6.93B parameters

BaenRH avatar Sep 27 '23 01:09 BaenRH

LlamaModel has no attribute named model, so that isn't rewritten, is addition.

EeyoreLee avatar Dec 14 '23 06:12 EeyoreLee