cookbook WIP Add HuggingFace arg so that arch is automatic

This pull request is made to work on adding automated parameter calculations for all hugging face models.

Expected Behaviour:

python transformer_mem.py --hf_model_name_or_path meta-llama/Llama-2-7b-hf --num-gpus 8 --zero-stage 3 --batch-size-per-gpu 2 --sequence-length 4096

Ref: [ #1 ]

May 08 '24 15:05 bhavnicksm

All committers have signed the CLA.

May 08 '24 15:05 CLAassistant

@Quentin-Anthony,

If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?

for eg.

python transformer_mem.py \ 
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \ 
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16

in the above example the num_attention_heads is both explicitly passed and implicitly provided.

May 09 '24 13:05 bhavnicksm

Currently, getting some wrong parameter estimates:

Calculating memory with training configuration: {'hf_model_name_or_path': 'NousResearch/Hermes-2-Pro-Llama-3-8B', 'num_gpus': 8, 'tensor_parallel_size': 1, 'pipeline_parallel_size': 1, 'partition_activations': False, 'zero_stage': 3, 'zero_allgather_bucket_size': 500000000.0, 'zero3_max_live_params': 1000000000.0, 'checkpoint_activations': False, 'batch_size_per_gpu': 2, 'sequence_length': 4096, 'vocab_size': 128288, 'hidden_size': 4096, 'num_attention_heads': 32, 'num_layers': 32, 'ffn_expansion_factor': 3.5, 'infer': False, 'kv_size_ratio': 0.25, 'is_mixed_precision': True, 'high_prec_bytes_per_val': 4, 'low_prec_bytes_per_val': 2, 'bytes_per_grad_ele': 4, 'num_experts': 0, 'expert_parallelism': 1, 'misc_mem_gib': 0}

Number of Parameters: 6.17 B

May 09 '24 14:05 bhavnicksm

@Quentin-Anthony,

If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?

for eg.
python transformer_mem.py \ 
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \ 
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16 
in the above example the num_attention_heads is both explicitly passed and implicitly provided.

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

May 09 '24 14:05 Quentin-Anthony

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

How do we check if the value is user-provided or a default value?

Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.

Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.

What do you think? @Quentin-Anthony

May 09 '24 14:05 bhavnicksm

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

How do we check if the value is user-provided or a default value?

Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.

Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.

What do you think? @Quentin-Anthony

But this would mean that we have no default values and that the user needs to pass everything? If I'm misunderstanding, maybe just implement what you're describing real quick and we can iterate.

May 24 '24 04:05 Quentin-Anthony

Hi @Quentin-Anthony, Added default value dictionary to handle None values, detecting user input when args are not none, and "replacing" them from the config (since they were already in the args, I skip those values)

May 24 '24 07:05 bhavnicksm

Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?

Aug 11 '24 16:08 bhavnicksm

Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?

Yep still needed! Reviewing now.

Aug 19 '24 15:08 Quentin-Anthony

I rebased, and some reason this PR "files changed" view is now showing all the rebase changes? Gonna try and close and reopen to see if that fixes it.

EDIT: That did it!

Aug 19 '24 16:08 Quentin-Anthony

Thank You @Quentin-Anthony! I enjoyed working on this with you, as my first open source PR. :)

Aug 22 '24 14:08 bhavnicksm