Jae-Won Chung

Results 58 issues of Jae-Won Chung

- `PipelineFrequencyOptimizer`: Large model training frameworks - Deepspeed - Megatron-LM - Deepspeed-Megatron - GPT-NeoX - Reuse training examples in each large model training framework (e.g., Llama pre-training or fine-tuning) -...

integration
roadmap

### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS:...

bug

For testing, we need to mock away GPUs and CPUs with those that return fake power/energy values. It would be convenient to have a generic helper class defined for it.

enhancement

Currently, whenever `get_cpus` is called, it will instantiate a `RAPLFile` for every CPU/DRAM domain, which will in turn spawn a wraparound monitor process. However, the user may not intend to...

enhancement
good first issue

AMDSMI renamed the name of the energy counter returned from `amdsmi_get_energy_count` from `power` to `energy_accumulator`. We want to follow this change without breaking backwards compatibility: ```python if "energy_accumulator" in energy_dict:...

good first issue
integration

There should be a way to obtain host power metrics via IPMI. This would be a good addition to the `ZeusMonitor`.

enhancement

Reading form Intel RAPL counters via `sysfs` also requires `sudo`, which is a perfect thing to add to `zeusd`. Currently, GPU endpoints are structured like: `POST /gpu/{gpu_id}/{command_name}`. I can imagine...

enhancement

I was reading the framework just out of interest and found two quick fixes. Especially, the rationale behind removing `strict=True` from `zip` is: 1. It's obvious that both iterables have...

CLA Signed

Recently there is some interest around GPU power draw (e.g., https://github.com/pytorch/pytorch/pull/132936), because large scale training consumes power at an unprecedented scale that affects datacenter power delivery and the grid. Along...

OSS contribution requested

`vim.highlight` is on its path to deprecation and the new nvim 0.11 prints out a warning message. Wherever `vim.highlight` is being used, I replaced it with `(vim.hl or vim.highlight)` so...