torchrec
torchrec copied to clipboard
Filter out batch_fused from available kernels if fused_params is not explicitly set and contains optimizer
Summary: ATT
One common mishap with current optimizer fusion is that planner may select batch_fused even if fused_parmas is empty (thus optimizer defaults to SGD). This is dangerous as it changes the model's behavior without the author knowing.
Ideally fused params aren't being passed in at the sharder level, but inferred from the module itself, but this is a step in the right direction (i hope)
Differential Revision: D36180601
This pull request was exported from Phabricator. Differential Revision: D36180601