AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Infinite recursion in module::sort(), speech_transformer

Open bpickrel opened this issue 1 year ago • 2 comments

Model speech_transformer in the DLM pytorch performance test causes an exception which is traced to an infinite recursion in the MIGRAPHX method module::sort().

The error is only seen in branch fix_find_pointwise_reduce because in the develop branch, the test exits earlier due to a different bug. The error is not seen in commit efc01466f which was created from develop commit ee68f7261f2 but does occur when I merge with commit 2bdd02d38c41 (may 3). It's not known what commit between those two first introduces the bug.

Steps:

  1. Set up DLM performance test environment
  2. Check out branch fix_find_pointwise_reduce and then merge branch develop.
  3. Run the test script python benchmarks/dynamo/torchbench.py --inference --float16 -dcuda --performance --backend migraphx -k speech_transformer
  4. Multiple *.mxr models are created. Run MIGraphX driver on the first one: bin/driver compile ../../pytorch/fused_0.mxr to see the fail.

bpickrel avatar May 06 '24 22:05 bpickrel

I just saw this same failure with the GoogleFnet model. An mxr file created from GoogleFnet is on hyd-7c-ZT09-02.amd.com

bpickrel avatar May 09 '24 23:05 bpickrel

Here's a reduced test case:

p = migraphx.program()
m = p.get_main_module()
x_0 = m.add_literal(migraphx.generate_argument(migraphx.shape(type="float_type", lens=[5,784,768]), 0))
x_1 = m.add_literal(migraphx.generate_argument(migraphx.shape(type="float_type", lens=[1]), 1))
p_x = m.add_parameter("x",migraphx.shape(type="float_type", lens=[5,784,768]))
x_3 = m.add_instruction(migraphx.op("reduce_mean", axes=[-1]), [p_x])
x_4 = m.add_instruction(migraphx.op("multibroadcast", out_lens=[5,784,768]), [x_3])
x_5 = m.add_instruction(migraphx.op("sub"), [p_x, x_4])
x_6 = m.add_instruction(migraphx.op("multibroadcast", out_lens=[5,784,768]), [x_1])
x_7 = m.add_instruction(migraphx.op("div"), [x_5, x_6])
x_8 = m.add_instruction(migraphx.op("mul"), [x_7, x_7])
m.add_instruction(migraphx.op("reduce_sum", axes=[-1]), [x_8])

pfultz2 avatar May 10 '24 01:05 pfultz2