xla
xla copied to clipboard
PR #17636: [NVIDIA GPU] Enhance concurrency handling in cross-rank address sharing
PR #17636: [NVIDIA GPU] Enhance concurrency handling in cross-rank address sharing
Imported from GitHub PR https://github.com/openxla/xla/pull/17636
This is a followup PR to https://github.com/openxla/xla/pull/15144. A distributed cache is maintained when device addresses are shared across ranks. There are two issues withe the existing implementation:
- The cache is not guarded by mutex;
- The cache initialization process have redundant access.
These issues can cause race condition or dead lock when the progress on different ranks are very close. Consequently we need to introduce below enhancements:
- Guard the cache with mutex;
- Shard the initialization process by rank, so that each rank only handle a piece of the cache and should not have overlapping access in theory.
Copybara import of the project:
-- a6472fc75fd0411bd8e65f27082e21e9a946ab17 by Terry Sun [email protected]:
enhance concurrency handling
-- 356ab824b95d66c793e361882e95d70689759ffd by Terry Sun [email protected]:
lock mutex
-- 29ebb2de64711bf4b4a08cf1593317228b56f825 by Terry Sun [email protected]:
bring back test
-- 91b911f0aaac0e590636a82956b464436e94ef9f by Terry Sun [email protected]:
better lock granularity
-- cc1d93a5f1032a205473961b2c2d3e14bee3a9c6 by Terry Sun [email protected]:
guard all accesses
Merging this change closes #17636
FUTURE_COPYBARA_INTEGRATE_REVIEW=https://github.com/openxla/xla/pull/17636 from terryysun:terryysun/sync_fix cc1d93a5f1032a205473961b2c2d3e14bee3a9c6