Halide We should shuffle around the realization order to minimize peak memory usage

Consider a pipeline with three outputs f2, g2, h2. These call Funcs f1, g1, h1 respectively. Everything is compute_root. The realization order f1 f2 g1 g2 h1 h2 is going to use a lot less intermediate memory than the order f1 g1 h1 f2 g2 h2.

We should shuffle the realization order of realizations at each loop level in schedule_functions to minimize the number of overlapping lifetimes. This could be done by identifying each loop level used in a compute_at, and then for each, coming up with a new realization order for that loop level. This would have to be done at the level of fused groups, not Funcs.

Mar 12 '24 17:03 abadams

This seems like something that should be scheduable, instead of automatic?

Mar 19 '24 17:03 rootjalex

This seems like something that should be scheduable, instead of automatic?

Is there ever a situation where we'd choose to use more than the minimum?

Mar 19 '24 17:03 steven-johnson

It also affects locality, so there might be a trade-off here. Also if the allocations are all dynamic-size, the peak usage and thus the order will depend on those sizes, so the compiler won't be able to infer it.

You can already sort of schedule it with compute_at(Var::outermost(), the_func_you_want_to_go_before)

Mar 19 '24 19:03 abadams