DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Same model Llama 7B, why does zero3 initialize different parameter sizes?
When I use Zero3, in initializing the network, if my Llama is rewritten by inheritance as follows:
class FlashLlamaModel(LlamaModel):
def __init__(self, config: LlamaConfig):
super().__init__(config)
class FlashLlamaForCausalLM(LlamaForCausalLM):
def __init__(self, config):
super().__init__(config)
self.model = FlashLlamaModel(config)
actor_model = create_hf_model(
model_class= FlashLlamaForCausalLM,
model_name_or_path=actor_model_name_or_path,
tokenizer=self.tokenizer,
ds_config=ds_config,
disable_dropout=self.args.disable_actor_dropout,
debug=debug)
Log output: initialize about 14B parameters.
'[2023-09-27 01:48:00,149] [INFO] [partition_parameters.py:454:__exit__] finished initializing model with 13.63B parameters'
But when not overriding the parent self.model, the log output goes back to 7B.
class FlashLlamaModel(LlamaModel):
def __init__(self, config: LlamaConfig):
super().__init__(config)
class FlashLlamaForCausalLM(LlamaForCausalLM):
def __init__(self, config):
super().__init__(config)
actor_model = create_hf_model(
model_class= FlashLlamaForCausalLM,
model_name_or_path=actor_model_name_or_path,
tokenizer=self.tokenizer,
ds_config=ds_config,
disable_dropout=self.args.disable_actor_dropout,
debug=debug)
Log output: initialize about 7B parameters.
[2023-09-27 01:52:23,475] [INFO] [partition_parameters.py:454:__exit__] finished initializing model with 6.93B parameters
LlamaModel has no attribute named model, so that isn't rewritten, is addition.