torchflare
torchflare copied to clipboard
Bump torch from 2.0.1 to 2.2.1
Bumps torch from 2.0.1 to 2.2.1.
Release notes
Sourced from torch's releases.
PyTorch 2.2.1 Release, bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
- Fix missing OpenMP support on Apple Silicon binaries (pytorch/builder#1697)
- Fix crash when mixing lazy and non-lazy tensors in one operation (pytorch/pytorch#117653)
- Fix PyTorch performance regression on Linux aarch64 (pytorch/builder#1696)
- Fix silent correctness in DTensor
_to_copyoperation (pytorch/pytorch#116426)- Fix properly assigning
param.grad_fnfor next forward (pytorch/pytorch#116792)- Ensure gradient clear out pending
AsyncCollectiveTensorin FSDP Extension (pytorch/pytorch#116122)- Fix processing unflatten tensor on compute stream in FSDP Extension (pytorch/pytorch#116559)
- Fix FSDP
AssertionErroron tensor subclass when settingsync_module_states=True(pytorch/pytorch#117336)- Fix DCP state_dict cannot correctly find FQN when the leaf module is wrapped by FSDP (pytorch/pytorch#115592)
- Fix OOM when when returning a AsyncCollectiveTensor by forcing
_gather_state_dict()to be synchronous with respect to the mian stream. (pytorch/pytorch#118197) (pytorch/pytorch#119716)- Fix Windows runtime
torch.distributed.DistNetworkError: [WinError 32] The process cannot access the file because it is being used by another process (pytorch/pytorch#118860)- Update supported python versions in package description (pytorch/pytorch#119743)
- Fix SIGILL crash during
import torchon CPUs that do not support SSE4.1 (pytorch/pytorch#116623)- Fix DCP RuntimeError in
get_state_dictandset_state_dict(pytorch/pytorch#119573)- Fixes for HSDP + TP integration with device_mesh (pytorch/pytorch#112435) (pytorch/pytorch#118620) (pytorch/pytorch#119064) (pytorch/pytorch#118638) (pytorch/pytorch#119481)
- Fix numerical error with
mixedmmon NVIDIA V100 (pytorch/pytorch#118591)- Fix RuntimeError when using SymInt input invariant when splitting graphs (pytorch/pytorch#117406)
- Fix compile
DTensor.from_localin trace_rule_look up (pytorch/pytorch#119659)- Improve torch.compile integration with CUDA-11.8 binaries (pytorch/pytorch#119750)
Release tracker pytorch/pytorch#119295 contains all relevant pull requests related to this release as well as links to related issues.
PyTorch 2.2: FlashAttention-v2, AOTInductor
PyTorch 2.2 Release Notes
- Highlights
- Backwards Incompatible Changes
- Deprecations
- New Features
- Improvements
- Bug fixes
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch® 2.2! PyTorch 2.2 offers ~2x performance improvements to
scaled_dot_product_attentionvia FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments.This release also includes improved torch.compile support for Optimizers, a number of new inductor optimizations, and a new logging mechanism called TORCH_LOGS.
Please note that we are deprecating macOS x86 support, and PyTorch 2.2.x will be the last version that supports macOS x64.
Along with 2.2, we are also releasing a series of updates to the PyTorch domain libraries. More details can be found in the library updates blog.
This release is composed of 3,628 commits and 521 contributors since PyTorch 2.1. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.2. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.
Summary:
... (truncated)
Commits
6c8c5ad[RelEng] DefineBUILD_BUNDLE_PTXAS(#119750) (#119988)f00f0abfix compile DTensor.from_local in trace_rule_look up (#119659) (#119941)077791bRevert "Update state_dict.py to propagate cpu offload (#117453)" (#119995)3eaaeebUpdate state_dict.py to propagate cpu offload (#117453) (#119916)0aa3fd3HSDP + TP integration bug fixes (#119819)eef51a6[Inductor] Skip triton templates for mixedmm on SM70- (#118591) (#119894)940358f[dtensor] fix dtensor _to_copy op for mix precision (#116426) (#119687)24e4751[state_dict] Calls wait() for the DTensor to_local() result (#118197) (#119692)dcaeed3[DCP][state_dict] Fix the issue that get_state_dict/set_state_dict ig… (#119807)4f882a5Properly preserve SymInt input invariant when splitting graphs (#117406) (#11...- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)