DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Set shuffle=True by default in data_sampler

Open ranzhejiang opened this issue 1 year ago • 1 comments

Set shuffle=True by default in data_sampler, the discuss is in #5619

ranzhejiang avatar Sep 13 '24 01:09 ranzhejiang

@microsoft-github-policy-service agree company="Intel"

ranzhejiang avatar Sep 13 '24 01:09 ranzhejiang

@ranzhejiang - we've fixed a number of issues with our CI, but we are still seeing this one timeout even after merging the fixes in. Do you have any ideas why this might be? The tests that rely on this shouldn't hang with this change.

loadams avatar Nov 04 '24 22:11 loadams

@ranzhejiang - we've fixed a number of issues with our CI, but we are still seeing this one timeout even after merging the fixes in. Do you have any ideas why this might be? The tests that rely on this shouldn't hang with this change.

I also feel confused about this hang problem, I will test it for my local machine to check it.

ranzhejiang avatar Nov 06 '24 01:11 ranzhejiang

@ranzhejiang - we've fixed a number of issues with our CI, but we are still seeing this one timeout even after merging the fixes in. Do you have any ideas why this might be? The tests that rely on this shouldn't hang with this change.

I also feel confused about this hang problem, I will test it for my local machine to check it.

@ranzhejiang - I'm still seeing the hangs and so far not able to determine why in my repro setup.

loadams avatar Jan 07 '25 01:01 loadams

@ranzhejiang - we've fixed a number of issues with our CI, but we are still seeing this one timeout even after merging the fixes in. Do you have any ideas why this might be? The tests that rely on this shouldn't hang with this change.

I also feel confused about this hang problem, I will test it for my local machine to check it.

@ranzhejiang - I'm still seeing the hangs and so far not able to determine why in my repro setup.

Sorry for late reply, I test this pr to train a real model on my own environment, but still not find hang issue. In my opinion, I will close this pr and reopen it once hang issue been fixed.

ranzhejiang avatar Jan 09 '25 03:01 ranzhejiang