PiPPy
PiPPy copied to clipboard
[SPMD][Fusion] tracking - move global buffer to just before first fusion
Per Alisson - we can reduce memory overhead by having the global buffer first created at first use (i.e. just before first fusion) rather than the current instantiation at the start of the graph. (during backwards pass, activations are released so delaying can reduce peak memory).