BladeDISC
BladeDISC copied to clipboard
[WIP]Add AutoOffloadingPass
this PR added a offloading pass to offloading a large live range buffer to the host and reloading to device at the right place to reduce the memory peak.
benchmark on part training graph, this pass reduced 32% peak memory: