KAKSIS

Results 3 issues of KAKSIS

Megatron's Expert Parallelism (EP) provides a significant speed advantage—approximately 8 to 10 times faster—compared to DeepSpeed Zero3 when training MoE models. Are there any plans or interest among developers to...

After megatron training and convert to hf model, i want to infer using vllm, which meet problem when loading. File "/python3.11/site-packages/vllm/model_executor/models/utils.py", line 250, in _load_module raise ValueError(msg) ValueError: There is...

help wanted

### Describe the feature examples/eval_subjective.py 在这个文件中,我把judge_models改为了vllmwithchattemplate的形式,似乎并不能正常评测,alpaca eval的最终输出结果为空。 请问主观评测脚本支持用本地模型作为judge模型吗? ### Will you implement it? - [ ] I would like to implement this feature and create a PR!