[Hardware][TPU] Implement tensor parallelism with Ray

Open WoosukKwon opened this issue 1 year ago • 1 comments

This PR implements Ray TPU executor for distributed inference support on TPU.

Work in progress. Three major issues:

NOTE: This PR was implemented before #5408. Needs to be re-based to reflect the changes.

Jun 26 '24 20:06 WoosukKwon

~~For this PR, I will merge it after getting reviews. :)~~

The changes outside the TPU backend was reviewed in #6812 and #6813.

Jun 26 '24 21:06 WoosukKwon