torch-mlir
torch-mlir copied to clipboard
Issue generating heavydep tests
I'm unable to generate heavydep tests at TOM using the script in build_tools
with the following output:
Traceback (most recent call last):
File "/home/george/.pyenv/versions/3.8-dev-debug/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/george/.pyenv/versions/3.8-dev-debug/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/george/mlir-npcomp/build_tools/torchscript_e2e_heavydep_tests/main.py", line 12, in <module>
from . import train_models
File "/home/george/mlir-npcomp/build_tools/torchscript_e2e_heavydep_tests/train_models.py", line 143, in <module>
neural_net_ts = generate_graph(neural_net_model, (input, ), training_fn)
File "/home/george/mlir-npcomp/build_tools/torchscript_e2e_heavydep_tests/train_models.py", line 65, in generate_graph
fx_g = make_fx(training_fn,
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/fx/experimental/proxy_tensor.py", line 285, in wrapped
t = dispatch_trace(wrap_key(f, args), tracer=fx_tracer, concrete_args=tuple(phs))
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/fx/experimental/proxy_tensor.py", line 178, in dispatch_trace
graph = tracer.trace(root, concrete_args)
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/fx/_symbolic_trace.py", line 714, in trace
(self.create_arg(fn(*args)),),
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/fx/_symbolic_trace.py", line 549, in flatten_fn
tree_out = root_fn(*tree_args)
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/fx/experimental/proxy_tensor.py", line 202, in wrapped
out = f(*tree_args)
File "/home/george/mlir-npcomp/build_tools/torchscript_e2e_heavydep_tests/train_models.py", line 131, in training_fn
optim.step()
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/optim/optimizer.py", line 114, in wrapper
return func(*args, **kwargs)
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/autograd/profiler.py", line 451, in __exit__
torch.ops.profiler._record_function_exit(self.handle)
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/_ops.py", line 148, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: Expected temporary cpp type wrapper of type at::RecordFunction
At @makslevental's suggestion I ran sed -i.bak -E 's/if not hooked/if not True/g' /home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/optim/optimizer.py
which fixes that error, but it goes on to fail with the following:
Traceback (most recent call last):
File "/home/george/.pyenv/versions/3.8-dev-debug/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/george/.pyenv/versions/3.8-dev-debug/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/george/mlir-npcomp/build_tools/torchscript_e2e_heavydep_tests/main.py", line 12, in <module>
from . import train_models
File "/home/george/mlir-npcomp/build_tools/torchscript_e2e_heavydep_tests/train_models.py", line 143, in <module>
neural_net_ts = generate_graph(neural_net_model, (input, ), training_fn)
File "/home/george/mlir-npcomp/build_tools/torchscript_e2e_heavydep_tests/train_models.py", line 76, in generate_graph
ts_g = torch.jit.script(fx_g)
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/jit/_script.py", line 1286, in script
return torch.jit._recursive.create_script_module(
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/jit/_recursive.py", line 476, in create_script_module
return create_script_module_impl(nn_module, concrete_type, stubs_fn)
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
File "/home/george/mlir-npcomp/heavy_venv/lib/python3.8/site-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
RuntimeError:
attribute lookup is not defined on builtin:
File "<eval_with_key>.2", line 5
def forward(self, params_1, params_2, params_3, params_4, args_1):
t_default = torch.ops.aten.t.default(params_1)
~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
addmm_default = torch.ops.aten.addmm.default(params_2, args_1, t_default); t_default = None
relu_default = torch.ops.aten.relu.default(addmm_default); addmm_default = None
To fix that I replaced ts_g = torch.jit.script(fx_g)
in build_tools/torchscript_e2e_heavydep_tests/train_models.py
with ts_g = torch.jit.trace(fx_g, inputs)
, but that fails with an argument mismatch:
forward() missing 4 required positional arguments: 'params_2', 'params_3', 'params_4', and 'args_1'
Needless to say, I think the bandaid solutions aren't working. Maks said this looked like an upstream bug, but I wanted to put this issue up to get more eyes on this and advice on how to move forward with this.
Can you identify which test is failing and when it was added and if it ever worked?
Can you identify which test is failing and when it was added and if it ever worked?
From my testing it seems to only be the two in train_models.py
, namely the basic NeuralNet training and BERT training. Commenting them out fixes the issue.
It was added in this commit. I haven't been able to make it work even reverting to that commit but I'll ask @pashu123 if he verified it when he wrote it.
Hi, @gpetters94 , @silvasean I also encountered the problem of "attribute lookup is not defined on builtin", is there any solution or idea? I'm trying to export the forward and backward computation graph for Resnet50 (based on torch dialect).
log:
Traceback (most recent call last):
File "004.py", line 37, in <module>
aot_module(models.resnet50(pretrained=True), print_graph("forward"), print_graph("backward"))(torch.randn(1,3,200,200))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/functorch/_src/aot_autograd.py", line 702, in forward
return compiled_f(
File "/root/.local/lib/python3.8/site-packages/functorch/_src/aot_autograd.py", line 621, in returned_function
compiled_fn = create_aot_dispatcher_function(
File "/root/.local/lib/python3.8/site-packages/functorch/_src/aot_autograd.py", line 355, in create_aot_dispatcher_function
aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
File "/root/.local/lib/python3.8/site-packages/functorch/_src/aot_autograd.py", line 259, in aot_dispatch_autograd
compiled_fw = aot_config.fw_compiler(fw_module, flat_args)
File "004.py", line 17, in f
f_script = torch.jit.script(fx_g)
File "/opt/conda/lib/python3.8/site-packages/torch/jit/_script.py", line 1286, in script
return torch.jit._recursive.create_script_module(
File "/opt/conda/lib/python3.8/site-packages/torch/jit/_recursive.py", line 476, in create_script_module
return create_script_module_impl(nn_module, concrete_type, stubs_fn)
File "/opt/conda/lib/python3.8/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
File "/opt/conda/lib/python3.8/site-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
RuntimeError:
attribute lookup is not defined on builtin:
File "<eval_with_key>.1", line 5
def forward(self, primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9, primals_10, primals_11, primals_12, primals_13, primals_14, primals_15, primals_16, primals_17, primals_18, primals_19, primals_20, primals_21, primals_22, primals_23, primals_24, primals_25, primals_26, primals_27, primals_28, primals_29, primals_30, primals_31, primals_32, primals_33, primals_34, primals_35, primals_36, primals_37, primals_38, primals_39, primals_40, primals_41, primals_42, primals_43, primals_44, primals_45, primals_46, primals_47, primals_48, primals_49, primals_50, primals_51, primals_52, primals_53, primals_54, primals_55, primals_56, primals_57, primals_58, primals_59, primals_60, primals_61, primals_62, primals_63, primals_64, primals_65, primals_66, primals_67, primals_68, primals_69, primals_70, primals_71, primals_72, primals_73, primals_74, primals_75, primals_76, primals_77, primals_78, primals_79, primals_80, primals_81, primals_82, primals_83, primals_84, primals_85, primals_86, primals_87, primals_88, primals_89, primals_90, primals_91, primals_92, primals_93, primals_94, primals_95, primals_96, primals_97, primals_98, primals_99, primals_100, primals_101, primals_102, primals_103, primals_104, primals_105, primals_106, primals_107, primals_108, primals_109, primals_110, primals_111, primals_112, primals_113, primals_114, primals_115, primals_116, primals_117, primals_118, primals_119, primals_120, primals_121, primals_122, primals_123, primals_124, primals_125, primals_126, primals_127, primals_128, primals_129, primals_130, primals_131, primals_132, primals_133, primals_134, primals_135, primals_136, primals_137, primals_138, primals_139, primals_140, primals_141, primals_142, primals_143, primals_144, primals_145, primals_146, primals_147, primals_148, primals_149, primals_150, primals_151, primals_152, primals_153, primals_154, primals_155, primals_156, primals_157, primals_158, primals_159, primals_160, primals_161, primals_162, primals_163, primals_164, primals_165, primals_166, primals_167, primals_168, primals_169, primals_170, primals_171, primals_172, primals_173, primals_174, primals_175, primals_176, primals_177, primals_178, primals_179, primals_180, primals_181, primals_182, primals_183, primals_184, primals_185, primals_186, primals_187, primals_188, primals_189, primals_190, primals_191, primals_192, primals_193, primals_194, primals_195, primals_196, primals_197, primals_198, primals_199, primals_200, primals_201, primals_202, primals_203, primals_204, primals_205, primals_206, primals_207, primals_208, primals_209, primals_210, primals_211, primals_212, primals_213, primals_214, primals_215, primals_216, primals_217, primals_218, primals_219, primals_220, primals_221, primals_222, primals_223, primals_224, primals_225, primals_226, primals_227, primals_228, primals_229, primals_230, primals_231, primals_232, primals_233, primals_234, primals_235, primals_236, primals_237, primals_238, primals_239, primals_240, primals_241, primals_242, primals_243, primals_244, primals_245, primals_246, primals_247, primals_248, primals_249, primals_250, primals_251, primals_252, primals_253, primals_254, primals_255, primals_256, primals_257, primals_258, primals_259, primals_260, primals_261, primals_262, primals_263, primals_264, primals_265, primals_266, primals_267, primals_268, primals_269, primals_270, primals_271, primals_272, primals_273, primals_274, primals_275, primals_276, primals_277, primals_278, primals_279, primals_280, primals_281, primals_282, primals_283, primals_284, primals_285, primals_286, primals_287, primals_288, primals_289, primals_290, primals_291, primals_292, primals_293, primals_294, primals_295, primals_296, primals_297, primals_298, primals_299, primals_300, primals_301, primals_302, primals_303, primals_304, primals_305, primals_306, primals_307, primals_308, primals_309, primals_310, primals_311, primals_312, primals_313, primals_314, primals_315, primals_316, primals_317, primals_318, primals_319, primals_320, primals_321):
convolution_default = torch.ops.aten.convolution.default(primals_321, primals_1, None, [2, 2], [3, 3], [1, 1], False, [0, 0], 1)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
native_batch_norm_default = torch.ops.aten.native_batch_norm.default(convolution_default, primals_2, primals_3, primals_162, primals_163, True, 0.1, 1e-05); primals_3 = None
getitem = native_batch_norm_default[0]
Does your code work if you do torch.ops.aten.convolution
instead of torch.ops.aten.convolution.default
?