xla Unnecessary FP64 cast for `padding_idx` in `EmbeddingDenseBackward`

torch_xla/csrc/tensor_ops.cpp #L232 converts an int64_t padding_idx to double. This introduces FP64 ops even though indices_rank1 is always int32 or int64, while the upstream ATen implementation keeps everything in integer space.

Questions

Was the double cast introduced for a specific historical or hardware reason?
How is this handled on devices without native FP64 support (e.g. TPU)?

Proposed fix

Keep padding_idx as int64_t, or
Cast it to the same dtype as indices_rank1 before comparison.

Either option would avoid unnecessary FP64 operations and align with ATen’s behavior.

Thanks for taking a look!

Jun 18 '25 21:06 unterumarmung

Thank you for filing this issue. Unfortunately, I have no idea why we cast padding_idx.

@bhavya01 @tengyifei @asuhan Any thoughts on this?

Jun 23 '25 12:06 ysiraichi

I don't know the reason for casting padding_idx either. On the surface, it looks unnecessary but I am not sure if it was done to avoid some XLA compiler error. @unterumarmung Do you want to create a PR to test removing the cast?

Jun 24 '25 18:06 bhavya01

@bhavya01 I've opened PR #9406. Since the fix is trivial, I didn’t test it locally — as a first-time contributor, I don’t yet have a local development environment set up. I’m relying on CI to validate the change and hope that’s acceptable. If additional tests are needed, I’m happy to add them. Also, I believe CI needs to be explicitly enabled for first-time contributors — could you help with that?

Jun 25 '25 11:06 unterumarmung