DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

create_dataset_split function: When the data volume is large, it may cause memory overflow. In this case, we should use the map function in datasets.

Open LiuShixing opened this issue 2 years ago • 0 comments

below is the original code:

https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/data/data_utils.py#L157

In my experiments, it will oom when dataset size is 500000

LiuShixing avatar Apr 21 '23 07:04 LiuShixing