LiuShixing
LiuShixing
below is the original code: https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/data/data_utils.py#L157 In my experiments, it will oom when dataset size is 500000
how to convert Megatron-DeepSpeed ColumnParallelLinear and RowParallelLinear to lora linear layer? ColumnParallelLinear defined in: https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/mpu/layers.py#L206
Hello! Thanks a lot for your job! I want to finetune bloomz-mt by your Megatron-DeepSpeed,but I can not find a universal version checkpoint of bloomz-mt or bloomz. I only found...
The training duration is 3 seconds. Without a significant drop in performance, what is the maximum duration supported during inference?
我的代码如下 ```python def init_agent_service(self): llm_cfg = { # "model": "gemini-2.5-pro-preview-05-06", # "model_server": "http://my_ip:port", # "api_key": "my_key", # "model": "deepseek-r1-250120", # "model_server": "http://my_ip:port", # "api_key": "my_key", "model": "claude-3-5-sonnet-v2@20241022", "model_server": "http://my_ip:port", "api_key":...
https://company.hpc-ai.com/blog/shocking-release-deepseek-671b-fine-tuning-guide-revealed-unlock-the-upgraded-deepseek-suite-with-one-click-ai-players-ecstatic The above article only provides the GPU requirements for SFT LoRA. What about GRPO?