torchrec
torchrec copied to clipboard
Always return .int() conversion from unwrap_kjt
Summary: With _unwrap_kjt_for_cpu, and recording the runtime_device in QuantEmbeddingBag, the conditional for the input device in _unwrap_kjt is no longer needed, as this path should always record the cuda device impl (as features will either be on cuda or meta)
Differential Revision: D56765511