streaming icon indicating copy to clipboard operation
streaming copied to clipboard

Using mosaicml streaming with accelerate ?

Open benihime91 opened this issue 1 year ago • 1 comments

Hi , how can i integrate mosaicml streaming with huggingface-accelerate.

Normally with a stypical dataset and dataloader you would need to do

data_loader = accelerator.prepare(data_loader)

and internally i think accelerate is wrapping the loader under a DistributedSampler of sorts. Is this required when using mosaicml streaming dataset ? Or i can skip this step following this comment: https://github.com/mosaicml/streaming/issues/225#issuecomment-1510478052

My use case is for multi-gpu multi-node training.

benihime91 avatar Jul 12 '24 07:07 benihime91

Hey @benihime91, what was the solution here? We've had some folks ask about using hf accelerate so would be good to know so we can add to docs.

cc @XiaohanZhangCMU

snarayan21 avatar Sep 25 '24 20:09 snarayan21

i skip this wrapping step.

If wrapping the dataloader, will cause error. https://github.com/mosaicml/streaming/issues/789#issuecomment-2405432617

wangyanhui666 avatar Oct 11 '24 02:10 wangyanhui666

Closing this out since not wrapping the dataloader seems to be the correct solution here.

snarayan21 avatar Oct 14 '24 14:10 snarayan21