cookbook icon indicating copy to clipboard operation
cookbook copied to clipboard

WIP Add HuggingFace arg so that arch is automatic

Open bhavnicksm opened this issue 1 year ago • 5 comments

This pull request is made to work on adding automated parameter calculations for all hugging face models.

Expected Behaviour:

python transformer_mem.py --hf_model_name_or_path meta-llama/Llama-2-7b-hf --num-gpus 8 --zero-stage 3 --batch-size-per-gpu 2 --sequence-length 4096

Ref: [ #1 ]

bhavnicksm avatar May 08 '24 15:05 bhavnicksm

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 08 '24 15:05 CLAassistant

@Quentin-Anthony,

If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?

for eg.

python transformer_mem.py \ 
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \ 
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16 

in the above example the num_attention_heads is both explicitly passed and implicitly provided.

bhavnicksm avatar May 09 '24 13:05 bhavnicksm

Currently, getting some wrong parameter estimates:

Calculating memory with training configuration: {'hf_model_name_or_path': 'NousResearch/Hermes-2-Pro-Llama-3-8B', 'num_gpus': 8, 'tensor_parallel_size': 1, 'pipeline_parallel_size': 1, 'partition_activations': False, 'zero_stage': 3, 'zero_allgather_bucket_size': 500000000.0, 'zero3_max_live_params': 1000000000.0, 'checkpoint_activations': False, 'batch_size_per_gpu': 2, 'sequence_length': 4096, 'vocab_size': 128288, 'hidden_size': 4096, 'num_attention_heads': 32, 'num_layers': 32, 'ffn_expansion_factor': 3.5, 'infer': False, 'kv_size_ratio': 0.25, 'is_mixed_precision': True, 'high_prec_bytes_per_val': 4, 'low_prec_bytes_per_val': 2, 'bytes_per_grad_ele': 4, 'num_experts': 0, 'expert_parallelism': 1, 'misc_mem_gib': 0}

Number of Parameters: 6.17 B

bhavnicksm avatar May 09 '24 14:05 bhavnicksm

@Quentin-Anthony,

If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?

for eg.

python transformer_mem.py \ 
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \ 
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16 

in the above example the num_attention_heads is both explicitly passed and implicitly provided.

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

Quentin-Anthony avatar May 09 '24 14:05 Quentin-Anthony

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

How do we check if the value is user-provided or a default value?

Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.

Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.

What do you think? @Quentin-Anthony

bhavnicksm avatar May 09 '24 14:05 bhavnicksm

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

How do we check if the value is user-provided or a default value?

Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.

Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.

What do you think? @Quentin-Anthony

But this would mean that we have no default values and that the user needs to pass everything? If I'm misunderstanding, maybe just implement what you're describing real quick and we can iterate.

Quentin-Anthony avatar May 24 '24 04:05 Quentin-Anthony

Hi @Quentin-Anthony, Added default value dictionary to handle None values, detecting user input when args are not none, and "replacing" them from the config (since they were already in the args, I skip those values)

bhavnicksm avatar May 24 '24 07:05 bhavnicksm

Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?

bhavnicksm avatar Aug 11 '24 16:08 bhavnicksm

Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?

Yep still needed! Reviewing now.

Quentin-Anthony avatar Aug 19 '24 15:08 Quentin-Anthony

I rebased, and some reason this PR "files changed" view is now showing all the rebase changes? Gonna try and close and reopen to see if that fixes it.

EDIT: That did it!

Quentin-Anthony avatar Aug 19 '24 16:08 Quentin-Anthony

Thank You @Quentin-Anthony! I enjoyed working on this with you, as my first open source PR. :)

bhavnicksm avatar Aug 22 '24 14:08 bhavnicksm