composer
composer copied to clipboard
Supercharge Your Model Training
# What does this PR do? # What issue(s) does this change relate to? # Before submitting - [ ] Have you read the [contributor guidelines](https://github.com/mosaicml/composer/blob/dev/CONTRIBUTING.md)? - [ ] Is...
# What does this PR do? Add a monitor of mlflow logger so that it sets run status as failed if main thread exits unexpectedly # What issue(s) does this...
# What does this PR do? Add precision change to BF16 when using FP8 eval # What issue(s) does this change relate to? https://databricks.atlassian.net/browse/GRT-3023
adding the download API for monolithic checkpoint
# What does this PR do? Restore dev version to 0.24.0.dev0
## Expected behavior EVAL_STANDALONE_END event is missing from the pseudo code definition in the Composer event doc. It looks like the pseudo code is referring to another name for the...
# What does this PR do? Marked as draft since it depends on #3434 Disables tensor parallelism when the `tensor_parallelism_degree` is 1. This should be a no-op and any TP...
# What does this PR do? This fixes a bug where if the TP configuration (specified through `parallelism_config['tp']`) was passed in as a dict, it would not be correctly processed...
Updates the requirements on [moto[s3]](https://github.com/getmoto/moto) to permit the latest version. Changelog Sourced from moto[s3]'s changelog. 5.0.10 Docker Digest for 5.0.10: sha256:bfb9cd2a437fc7c754b3a6a66b7fb528ec1a53e0c683e8b75514bff81543cf55 General: * CloudFormation now supports cfn-lint v1, as well...
# What does this PR do? Report job status to mlflow logger # What issue(s) does this change relate to? # Before submitting - [ ] Have you read the...