DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Example models using DeepSpeed

Results 323 DeepSpeedExamples issues
Sort by recently updated
recently updated
newest added

https://github.com/microsoft/DeepSpeedExamples/blob/737c6740bec38b77a24a59135b6481a53d566b38/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_log_output/opt-1.3b-globalBatchSize128.log#L4 Why is the PPL here 4k when we are starting with a pretrained model?

``` JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- [WARNING] async_io requires the dev libaio .so object and headers but these were...

Desciption: In DeepSpeed-Chat step3, a runtime error: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 will be thrown when inference_tp_size>1...

I have successfully run step 1 and step 2 and generated the models, but encountered an error when running step 3: "RuntimeError: The size of tensor a (5120) must match...

Example scripts for DeepNVMe

This PR adds a new client which can test performance of LLM serving conforms to OpenAI API. This gives the flexibility of start a server seperately and benchmark that server...

![微信图片_20240625102800](https://github.com/microsoft/DeepSpeedExamples/assets/56241957/f4cf7cdb-a9cc-406f-9087-1d3f584cd242)

Enable Intel CPU and Intel XPU support for Benchmark Suite. Many customers use deepspeed on CPU and XPU for LLM models, and this benchmark suite helps them to debugging communication...

I was running the script from step3: python3 train.py --step 3 --deployment-type single_gpu The training.log shows this: A decoder-only architecture is being used, but right-padding was detected! For correct generation...

If there is a main.py, I want to train with deepspeed through a parameter, since the commands for these two training methods are different. Such as, if the parameter==true, training...