LiuShixing issues

Results 6 issues of


                                            LiuShixing

create_dataset_split function： When the data volume is large, it may cause memory overflow. In this case, we should use the map function in datasets.

below is the original code: https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/data/data_utils.py#L157 In my experiments, it will oom when dataset size is 500000

Does it support model parallel? How to convert ColumnParallelLinear to lora linear?

how to convert Megatron-DeepSpeed ColumnParallelLinear and RowParallelLinear to lora linear layer? ColumnParallelLinear defined in: https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/mpu/layers.py#L206

bloomz-mt universal checkpoint

Hello! Thanks a lot for your job! I want to finetune bloomz-mt by your Megatron-DeepSpeed，but I can not find a universal version checkpoint of bloomz-mt or bloomz. I only found...

Maximum duration supported during inference?

The training duration is 3 seconds. Without a significant drop in performance, what is the maximum duration supported during inference?

不支持claude-3-5和gemini-2.5？

我的代码如下 ```python def init_agent_service(self): llm_cfg = { # "model": "gemini-2.5-pro-preview-05-06", # "model_server": "http://my_ip:port", # "api_key": "my_key", # "model": "deepseek-r1-250120", # "model_server": "http://my_ip:port", # "api_key": "my_key", "model": "claude-3-5-sonnet-v2@20241022", "model_server": "http://my_ip:port", "api_key":...

【Question】What is the minimum number of GPUs required to train deepseek 671B with GRPO? How about using LoRA?

https://company.hpc-ai.com/blog/shocking-release-deepseek-671b-fine-tuning-guide-revealed-unlock-the-upgraded-deepseek-suite-with-one-click-ai-players-ecstatic The above article only provides the GPU requirements for SFT LoRA. What about GRPO?