cookbook
cookbook copied to clipboard
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
This pull request is made to work on adding automated parameter calculations for all hugging face models. Expected Behaviour: ```python python transformer_mem.py --hf_model_name_or_path meta-llama/Llama-2-7b-hf --num-gpus 8 --zero-stage 3 --batch-size-per-gpu 2...
Stas Bekman had the idea of supporting a HuggingFace model as input so that all model architecture settings don't need manually dug up. We'd like something like: ``` python transformer_mem.py...
Addresses https://github.com/EleutherAI/cookbook/issues/36 Before: ``` $ python calc/calc_transformer_mem.py --infer --high-prec-b ytes-per-val 4 --low-prec-bytes-per-val 1 --num-gpus 2 --zero-stage 3 -ca -b 1 -s 1024 -v 152 064 -hs 8192 -a 64 -l...
Running `calc_transformer_mem.py` with the parameters for Qwen1.5-72B prints that this model has 56.19 billion parameters, while the real number is around 72 billion: `python calc_transformer_mem.py --infer --high-prec-bytes-per-val 4 --low-prec-bytes-per-val 1...
As per the title. This arg is in the other two scripts but was missing for `calc_transformer_flops.py`
Would be good to add I/O benchmarks in the style of existing communication and computation benchmarks.
Currently the model directory webpage at https://github.com/EleutherAI/cookbook/tree/main/model-directory isn't live and entirely undocumented. - [ ] Make model directory webpage live - [ ] Add model hparam setting html page and...
Would be good to model the communication volume in bytes of a given parallelism setup. Situations to model: - Different parallelism schemes - ZeRO-1/2/3, ZeRO++ - 3D parallelism - Activation...
As recently pointed out in https://arxiv.org/abs/2401.00448, inference FLOPs are also important and it would be easy to add a flag to https://github.com/EleutherAI/cookbook/blob/main/calc/calc_transformer_flops.py for the inference and training+inference cases.
While the calc scripts are correct for llama-style models, their implementation is inflexible (see https://github.com/EleutherAI/cookbook/issues/36 and https://github.com/EleutherAI/cookbook/pull/35) It'd be nice to clean this up a bit.