Saaketh Narayan

Results 96 comments of Saaketh Narayan

ah that makes sense @mvpatel2000 lemme try

gloo change is in https://github.com/mosaicml/composer/pull/3509

Converting to draft since we will need to bump peft version anyways

Yes, your understanding is correct. You could track the cache usage similar to how we already do it in StreamingDataset ([here](https://github.com/mosaicml/streaming/blob/5f939c9057b041f10342dfc5744d2d3880e3f14b/streaming/base/dataset.py#L1177)), but may face issues with eviction since the videos...

Hey, we don't offer direct support for zip or tar since Streaming requires the data to be in a predictable format (as written by our Writer classes, such as MDSWriter,...

Hey! We don't currently guarantee deterministic sample order if `replication` changes, but I see how that would be useful. Will take note of this request. thanks!

@CodeCreator do you see this even when going from replication 2 -> replication 4, for example?

@casper-hansen So StreamingDataset's `replication` argument assumes that the ranks that have replicated samples are in contiguous blocks of global rank indices. Concretely, suppose on 16 GPUs, I have a `replication`...

Hey @benihime91, what was the solution here? We've had some folks ask about using hf accelerate so would be good to know so we can add to docs. cc @XiaohanZhangCMU

Closing this out since not wrapping the dataloader seems to be the correct solution here.