DeepSpeedExamples issues

Step 3: RuntimeError: CUDA error: misaligned address

3

I try to run RLHF for my previously trained Actor and Reward model. However, I encounter the following Exception: ``` Traceback (most recent call last): File "/home/ec2-user/SageMaker/deepspeedexamples-fork/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 516, in...

EikeKohl

Default configuration running with V100-32G causes OOM

5

When using the official default configuration with a single V100-32G, I found this will cause OOM with the whole pipline. And according to other issues mentioned above, I changed the...

binderwang

deespeed chat

Does this program supports tensorboard?

2

Does this program supports tensorboard? Could not find any logs of tensorbard.

Chevolier

enhancement

deespeed chat

step 3 : OOM

5

Steps 1 and 2 are running normally. When running step 3, I encountered an OOM (out of memory) issue again. Even when the batch size was set to 1, it...

MAJIN123

SFT training ,single gpu (V100 32G), how to adjust my parameters to avoid OOM, thx

2

OutOfMemoryError: CUDA out of memory. Tried to allocate 3.82 GiB (GPU 0; 31.75 GiB total capacity; 23.21 GiB already allocated; 2.43 GiB free; 25.59 GiB reserved in total by PyTorch)...

Modas-Li

deespeed chat

how to use zero-offload?

2

Hi, I am trying to train a GPT-2 model using "DeepSpeed-Chat” code. Bur in step 1, when I use the "--offload", I got a error. below is the problem: ![image](https://user-images.githubusercontent.com/38311101/233521671-5dc88eaf-df4e-4632-be26-9eab42deb74d.png)

xdnjust

Run multi-node training failed, how to train without hostfile

5

I run the training script in a multi-node env: training/step1_supervised_finetuning/training_scripts/multi_node/run_66b.sh But it seems that the multi-nodes are not launched successfully and a warning in the log as below: ``` 2023-04-21...

xiaoyi0814

deespeed chat

system

Why only_optimize_lora and gradient_checkpointing cannot be used together

1

I have V100-32G * 8, when using lora_dim=128 and gradient_checkpointing, training step1 runs well, however training is slow. When I drop gradient_checkpointing to use only_optimize_lora, I got oom. Could you...

JackieHanC

create_dataset_split function： When the data volume is large, it may cause memory overflow. In this case, we should use the map function in datasets.

below is the original code: https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/data/data_utils.py#L157 In my experiments, it will oom when dataset size is 500000

LiuShixing

Currently, the docker image is suitable for Ubuntu. Could you provide a method for CentOS?

3

Because the operating system of the server is CentOS, errors often occur when installing according to the method provided by the author.

Rockie-Liu

DeepSpeedExamples
DeepSpeedExamples copied to clipboard

Metadata

Step 3: RuntimeError: CUDA error: misaligned address

Default configuration running with V100-32G causes OOM

Does this program supports tensorboard?

step 3 : OOM

SFT training ,single gpu (V100 32G), how to adjust my parameters to avoid OOM, thx

how to use zero-offload?

Run multi-node training failed, how to train without hostfile

Why only_optimize_lora and gradient_checkpointing cannot be used together

create_dataset_split function： When the data volume is large, it may cause memory overflow. In this case, we should use the map function in datasets.

Currently, the docker image is suitable for Ubuntu. Could you provide a method for CentOS?

← Metadata

Owner

Metadata

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeedExamples
DeepSpeedExamples copied to clipboard