WIP Add HuggingFace arg so that arch is automatic
This pull request is made to work on adding automated parameter calculations for all hugging face models.
Expected Behaviour:
python transformer_mem.py --hf_model_name_or_path meta-llama/Llama-2-7b-hf --num-gpus 8 --zero-stage 3 --batch-size-per-gpu 2 --sequence-length 4096
Ref: [ #1 ]
@Quentin-Anthony,
If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?
for eg.
python transformer_mem.py \
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16
in the above example the num_attention_heads is both explicitly passed and implicitly provided.
Currently, getting some wrong parameter estimates:
Calculating memory with training configuration: {'hf_model_name_or_path': 'NousResearch/Hermes-2-Pro-Llama-3-8B', 'num_gpus': 8, 'tensor_parallel_size': 1, 'pipeline_parallel_size': 1, 'partition_activations': False, 'zero_stage': 3, 'zero_allgather_bucket_size': 500000000.0, 'zero3_max_live_params': 1000000000.0, 'checkpoint_activations': False, 'batch_size_per_gpu': 2, 'sequence_length': 4096, 'vocab_size': 128288, 'hidden_size': 4096, 'num_attention_heads': 32, 'num_layers': 32, 'ffn_expansion_factor': 3.5, 'infer': False, 'kv_size_ratio': 0.25, 'is_mixed_precision': True, 'high_prec_bytes_per_val': 4, 'low_prec_bytes_per_val': 2, 'bytes_per_grad_ele': 4, 'num_experts': 0, 'expert_parallelism': 1, 'misc_mem_gib': 0}
Number of Parameters: 6.17 B
@Quentin-Anthony,
If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?
for eg.
python transformer_mem.py \ --hf_model_name_or_path meta-llama/Llama-2-7b-hf \ --num-gpus 8 \ --zero-stage 3 \ --batch-size-per-gpu 2 \ --sequence-length 4096 \ --num_attention_heads 16in the above example the
num_attention_headsis both explicitly passed and implicitly provided.
I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")
I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")
How do we check if the value is user-provided or a default value?
Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.
Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.
What do you think? @Quentin-Anthony
I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")
How do we check if the value is user-provided or a default value?
Say, the user gave
num_attention_headsas64which is also the default value, theargswould not be able to tell them apart.Instead, maybe we could keep the default values in another dictionary and have the
parseronly haveNonetype as default values so we can tell when we get a user input and when we are taking a default value.What do you think? @Quentin-Anthony
But this would mean that we have no default values and that the user needs to pass everything? If I'm misunderstanding, maybe just implement what you're describing real quick and we can iterate.
Hi @Quentin-Anthony, Added default value dictionary to handle None values, detecting user input when args are not none, and "replacing" them from the config (since they were already in the args, I skip those values)
Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?
Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?
Yep still needed! Reviewing now.
I rebased, and some reason this PR "files changed" view is now showing all the rebase changes? Gonna try and close and reopen to see if that fixes it.
EDIT: That did it!
Thank You @Quentin-Anthony! I enjoyed working on this with you, as my first open source PR. :)