pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

Worse performance than ATen: aten._log_softmax

Open IvanYashchuk opened this issue 1 year ago • 0 comments

🐛 Describe the bug

aten._log_softmax.default

Here's the result comparing to ATen:

benchmark geomean 20th percentile 50th percentile 80th percentile
HuggingFace 0.91 0.63 0.99 1.21
Torchbench 0.99 0.99 0.99 0.99
TIMM 0.99 0.98 0.99 1.0

Both ATen and nvFuser path are using CUDA Graphs.

Apply this patch first

diff --git a/torch/_prims/context.py b/torch/_prims/context.py
index 203d73fd94..1789775e05 100644
--- a/torch/_prims/context.py
+++ b/torch/_prims/context.py
@@ -254,9 +254,9 @@ def _is_func_unsupported_nvfuser(
 class TorchRefsNvfuserCapabilityMode(TorchRefsMode):
     def __init__(self, *, skip_ops=()):
         aten_ops_to_skip = (
-            "aten._log_softmax.default",
-            "aten._log_softmax_backward_data.default",
-            "aten.expand.default",
+            #"aten._log_softmax.default",
+            #"aten._log_softmax_backward_data.default",
+            #"aten.expand.default",
         )
         self.skip_ops = tuple(skip_ops) + aten_ops_to_skip
         super().__init__(
git clone https://gitlab-master.nvidia.com/iyashchuk/aten_ops_perf.git
cd aten_ops_perf
python aten_ops_perf.py --suite huggingface --dtype float32 --max-samples 100 --op aten._log_softmax.default

Check out this gist for the logs: https://gist.github.com/IvanYashchuk/8f433d9512ab1f02a7f960072ba10bb0

Badly performing samples are:

  • (512, 50265) dim=1
  • (8192, 50265) dim=1
  • (1024, 50265) dim=1
  • (4096, 50265) dim=1
  • (2048, 50265) dim=1
  • (511, 30522) dim=1
  • (2048, 50005) dim=1
  • (256, 256008) dim=1
  • (157, 50257) dim=1
  • (1024, 50005) dim=1
  • (256, 128112) dim=1
  • (64, 128) dim=1
  • (1024, 50358) dim=1
  • (508, 50272) dim=1
  • (511, 50257) dim=1

_log_softmax is implemented here: https://github.com/pytorch/pytorch/blob/35be73df094f02dd26562cf665a6158e80bc4045/torch/_decomp/decompositions.py#L988-L1006

Versions

Checked on upstream master.

IvanYashchuk avatar Nov 03 '22 14:11 IvanYashchuk