Add HuggingFace arg so that arch is automatic

Open Quentin-Anthony opened this issue 2 years ago • 3 comments

Stas Bekman had the idea of supporting a HuggingFace model as input so that all model architecture settings don't need manually dug up. We'd like something like:

python transformer_mem.py --hf_model_name_or_path meta-llama/Llama-2-7b-hf --num-gpus 8 --zero-stage 3 --batch-size-per-gpu 2 --sequence-length 4096

Dec 20 '23 23:12 Quentin-Anthony

Hey @Quentin-Anthony,
is someone working on this? If not, I can try to make a PR for this.

May 07 '24 20:05 bhavnicksm

Nobody is. I'd love a PR!

May 08 '24 08:05 Quentin-Anthony

@Quentin-Anthony, created a draft PR, we can continue the conversation there

May 08 '24 15:05 bhavnicksm