Mihir Patel

Results 16 issues of Mihir Patel

Use state algos, so engine changes to algos can adjust these variables as needed. Needed for agent

Closes [JIRAs](https://mosaicml.atlassian.net/browse/CO-680). Putting this up as a reference during refactor discussions

# What does this PR do? Removes C4 dataset. This is currently broken with `datasets` upgrade. We recommend using streaming datasets anyways, so we're just going to get rid of...

Adds precision to eval. Sets MPT to bf16. For some reason, BF16 + FSDP requires mixed_precision: FULL. It works fine without FSDP. FP16 also works fine and gives basically the...

# What does this PR do? Auto enables MosaicML logger when running on our platform. Waiting for platform to auto-inject credentials # What issue(s) does this change relate to? [CO-2183](https://mosaicml.atlassian.net/browse/CO-2183)...

# What does this PR do? In-line group to avoid extra ref. This avoids spiking peak GPU memory usage during wrapping. @sashaDoubov will test. # What issue(s) does this change...

# What does this PR do? Only deepspeed has errors with pydantic 2. Moving the pin down to there as we don't actually use in composer normally

## Summary As libcloud uses `_` in scheme name, many popular libraries, eg urllib, fail to parse the URIs generated. ``` In [1]: from urllib.parse import urlparse In [2]: urlparse('AZURE_BLOBS://data')...

# What does this PR do? This PR adds Tensor Parallelism (TP) integration. As part of this, we simplify the Trainer interface for parallelism. Current limitations: - TP requires FSDP...