streaming
streaming copied to clipboard
Using mosaicml streaming with accelerate ?
Hi , how can i integrate mosaicml streaming with huggingface-accelerate.
Normally with a stypical dataset and dataloader you would need to do
data_loader = accelerator.prepare(data_loader)
and internally i think accelerate is wrapping the loader under a DistributedSampler of sorts. Is this required when using mosaicml streaming dataset ? Or i can skip this step following this comment: https://github.com/mosaicml/streaming/issues/225#issuecomment-1510478052
My use case is for multi-gpu multi-node training.
Hey @benihime91, what was the solution here? We've had some folks ask about using hf accelerate so would be good to know so we can add to docs.
cc @XiaohanZhangCMU
i skip this wrapping step.
If wrapping the dataloader, will cause error. https://github.com/mosaicml/streaming/issues/789#issuecomment-2405432617
Closing this out since not wrapping the dataloader seems to be the correct solution here.