accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

Feature request: ability to truncate datasets on split rather than pad

Open bghira opened this issue 2 years ago • 3 comments
trafficstars

As a Developer working on latent diffusion model training via SimpleTuner, it has become evident that the built-in mechanism for splitting datasets across processes is not smart enough to apply in cases where a robust sample-tracking mechanism is in use.

SimpleTuner uses a 'seen' list to keep track of samples per epoch so that we do not inadvertently oversample. This has the side effect of padding not actually working, since the repeated samples in the list are simply discarded.

What happens next, is that one of the GPUs runs out of data just before the other would have, and then causes a deadlock while the main process waits for the backward pass, which will never come.

My solution was to truncate the sets I'm splitting, taking into account the batch_size * gradient_steps * num_processes and then split it. But, it occurred to me, having this be built-in would be nice.

bghira avatar Oct 01 '23 21:10 bghira

Nice idea! Would you like to attempt a PR with your solution? :)

muellerzr avatar Oct 11 '23 17:10 muellerzr

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 05 '23 15:11 github-actions[bot]

As a Developer working on latent diffusion model training via SimpleTuner, it has become evident that the built-in mechanism for splitting datasets across processes is not smart enough to apply in cases where a robust sample-tracking mechanism is in use.

SimpleTuner uses a 'seen' list to keep track of samples per epoch so that we do not inadvertently oversample. This has the side effect of padding not actually working, since the repeated samples in the list are simply discarded.

What happens next, is that one of the GPUs runs out of data just before the other would have, and then causes a deadlock while the main process waits for the backward pass, which will never come.

My solution was to truncate the sets I'm splitting, taking into account the batch_size * gradient_steps * num_processes and then split it. But, it occurred to me, having this be built-in would be nice.

Cxz324 avatar Feb 21 '25 08:02 Cxz324