DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Example models using DeepSpeed
Peft looks more convenient to integrate LoRA. Is it because of policy of your company or is there any reason?
Allow specifying the number of quantization groups in the inference test script using a `quantize_groups` argument. This PR is complimentary to PR [#3519](https://github.com/microsoft/DeepSpeed/pull/3519) on the main repo, and should be...
Print mean loss periodically based on deepspeed 'steps_per_print' configuration. So, mean loss is printed on an optimizer step boundary. To reduce log clutter, only rank 0 loss is printed. This...
Here is the error I met, seems like the `self._total_batch_size` is `None`, but I don't know the reason ``` File "/path/model_training/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 434, in main out = trainer.generate_experience(batch_prompt['prompt'], File "/path/model_training/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py",...
deeepspeed chat 支持pipline 并行吗? ```[tasklist] ### Tasks ```
I'm facing the above error in both stage 1 and stage 2 when using BLOOMZ 3B and 560M. I tried adding "model.to(device)" and "model.to('cuda')" to main.py but neither worked. The...
this [line](https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py#L299) ```python losses += loss.float() ``` will accumulate grad and result in OOM I think maybe change to this: ```python losses += loss.detach().cpu() ```
Hi, In step 3, run the following command and getting "OOM" when Initializing Ref Model (Actor Model initialized perfectly): > Actor_Lr=9.65e-6 Critic_Lr=5e-6 deepspeed --master_port 12346 main.py \ --data_path Dahoas/rm-static \...
https://github.com/microsoft/DeepSpeedExamples/blob/b116838b905430a5fbebe3713a68d90638478aa9/applications/DeepSpeed-Chat/dschat/utils/data/data_utils.py#L301 If a task is run in multiple nodes, it seems that data cache building is redundant in other nodes.
Error info: File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/hybrid_engine.py", line 99, in new_inference_container File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/containers/gptneox.py", line 95, in get_hidden_heads IndexError _container.create_ds_model_config() File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/containers/base.py", line 79, in create_ds_model_config : tuple index out of range return self.client_module.attention.query_key_value.weight.shape[1],...