DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Example models using DeepSpeed
https://github.com/microsoft/DeepSpeedExamples/blob/737c6740bec38b77a24a59135b6481a53d566b38/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_log_output/opt-1.3b-globalBatchSize128.log#L4 Why is the PPL here 4k when we are starting with a pretrained model?
``` JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- [WARNING] async_io requires the dev libaio .so object and headers but these were...
Desciption: In DeepSpeed-Chat step3, a runtime error: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 will be thrown when inference_tp_size>1...
I have successfully run step 1 and step 2 and generated the models, but encountered an error when running step 3: "RuntimeError: The size of tensor a (5120) must match...
Example scripts for DeepNVMe
This PR adds a new client which can test performance of LLM serving conforms to OpenAI API. This gives the flexibility of start a server seperately and benchmark that server...

Enable Intel CPU and Intel XPU support for Benchmark Suite. Many customers use deepspeed on CPU and XPU for LLM models, and this benchmark suite helps them to debugging communication...
I was running the script from step3: python3 train.py --step 3 --deployment-type single_gpu The training.log shows this: A decoder-only architecture is being used, but right-padding was detected! For correct generation...
If there is a main.py, I want to train with deepspeed through a parameter, since the commands for these two training methods are different. Such as, if the parameter==true, training...