cookbook
cookbook copied to clipboard
Add HuggingFace arg so that arch is automatic
Stas Bekman had the idea of supporting a HuggingFace model as input so that all model architecture settings don't need manually dug up. We'd like something like:
python transformer_mem.py --hf_model_name_or_path meta-llama/Llama-2-7b-hf --num-gpus 8 --zero-stage 3 --batch-size-per-gpu 2 --sequence-length 4096
Hey @Quentin-Anthony,
is someone working on this? If not, I can try to make a PR for this.
Nobody is. I'd love a PR!
@Quentin-Anthony, created a draft PR, we can continue the conversation there