Yukio Siraichi comments

Results 80 comments of


                                            Yukio Siraichi

[torchbench] Detectron2 benchmarks failing to run.

After further investigation, I found out the issue is due to a combination of 2 factors: - The model, as well as the example inputs, are converted to `float16` -...

[torchbench] Detectron2 benchmarks failing to run.

Apparently, after doing (1), I am getting another error: ```python File "/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/proposal_utils.py", line 121, in find_top_rpn_proposals keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh) File "/lib/python3.8/site-packages/detectron2/layers/nms.py", line 20, in batched_nms return box_ops.batched_nms(boxes.float(),...

[torchbench] Detectron2 benchmarks failing to run.

I see. So, maybe a solution is to pass `--precision fp32` when instantiating the benchmark, while having `XLA_USE_FP16` set. What do you think?

[torchbench] Detectron2 benchmarks failing to run.

This issue was temporarily fixed by #6389. #6404 details a better fix to this upcasting problem. One of them being the actual problem description on #6403.

[torchbench] Detectron2 benchmarks failing to run.

Apparently, this issue was not due to conversion issues (https://github.com/pytorch/pytorch/issues/115792) as we once thought, but it's a real problem (more details [in this comment](https://github.com/pytorch/xla/issues/6336#issuecomment-1902677834)).

[torchbench] Detectron2 benchmarks failing to run.

@miladm @JackCaoG Here's what I found when looking into this issue (`nms` fallbacking to the CPU kernel): even though there's an [implementation of `nms` inside PyTorch/XLA](https://github.com/pytorch/xla/blob/92bb381af277140a2d5fe8af4cd371a3c9c5c2d1/torch_xla/csrc/init_python_bindings.cpp#L649-L669), it appears that the...

[torchbench] Detectron2 benchmarks failing to run.

@JackCaoG While the solution in [this comment](https://github.com/pytorch/xla/issues/6336#issuecomment-1989716090) works, I thought it would make more sense to implement a `CompositeExplicitAutograd` version on TorchVision, directly. What do you think?