Hanlin Tang

Results 71 comments of Hanlin Tang

Thanks @vedantroy is there a model that you are training, where you encounter this issue? And is this also an issue you observe when running some of our standard benchmarks...

Yes, agreed that we need a consistent rule here! cc: @dblalock to opine on whether these in-place operations should actually return the `model`, or just always return `None`.

> In most cases except broadcast it will be thin wrapper. Did you mean I should do it for every function or just where we have some extra logic? Yeah,...

Hi @vedantroy , yes you can provide the `save_folder` argument for the Trainer. For more details on checkpointing, see our guide here: https://docs.mosaicml.com/en/v0.9.0/trainer/checkpointing.html, and let me know if anything there...

I sort of agree.. object stores are not file systems, and mixing the two (e.g. S3Filesystem) is confusing.

Thanks for the bug report @antoinebrl and @erosenthal-square . We'll take a look and try a fix. `torchmetrics` was also recently updated to 1.0, and they may have fixed their...

Hello @growlix , are you running this in `fp8`? If so, this issue was fixed in https://github.com/mosaicml/composer/pull/2907 and released in [v0.19.0](https://github.com/mosaicml/composer/releases/tag/v0.19.0), so you should upgrade your composer version.

Hi @tginart and @Paladiamors , it would be helpful to share some more information about your machine and OS specs. Could you try to run https://github.com/mosaicml/composer/blob/dev/composer/utils/collect_env.py , and also share...

Thank you both for the information! cc: @karan6181 and @knighton