torch-mlir icon indicating copy to clipboard operation
torch-mlir copied to clipboard

Python bindings build failure on Windows: Generating Lazy Tensor Core IR Nodes

Open oroppas opened this issue 2 years ago • 2 comments

Generating Lazy Tensor Core IR Nodes fails on Windows, more specifically build_tools\autogen_ltc_backend.py. It fails on multiple points so I list the issues and fixes step-by-step:

1. subprocess.check_output behavior on Windows

subprocess.check_output requires shell=True on Windows when calling built-in command:

[101/142] Generating Lazy Tensor Core IR Nodes
FAILED: tools/torch-mlir/generated_backend.hash tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/RegisterLazy.cpp tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/shape_inference.cpp D:/packages/torch-mlir/build/tools/torch-mlir/generated_backend.hash D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/RegisterLazy.cpp D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/shape_inference.cpp
cmd.exe /C "cd /D D:\packages\torch-mlir\build\tools\torch-mlir\python\torch_mlir\csrc\base_lazy_backend && C:\Users\ryuta\AppData\Local\Programs\Python\Python39\python.exe D:/packages/torch-mlir/torch-mlir/build_tools/autogen_ltc_backend.py -b D:/packages/torch-mlir/build/tools/torch-mlir"
Traceback (most recent call last):
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 537, in <module>
    main(args)
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 498, in main
    generator()
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 481, in __call__
    self.generate_native_functions()
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 230, in generate_native_functions
    for op in subprocess.check_output(
  File "C:\Users\ryuta\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Users\ryuta\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Users\ryuta\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\ryuta\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

https://github.com/llvm/torch-mlir/blob/c0630da678ea015ca8757340633c654d5539adac/build_tools/autogen_ltc_backend.py#L221-L225

This can be fixed by adding shell=True

            op[6:]
            for op in subprocess.check_output(
                cmd,
                encoding="utf-8",
                shell=True
            )

2. grep on Windows (or lack thereof)

This one is pretty simple. Windows does not have grep:

[103/142] Generating Lazy Tensor Core IR Nodes
FAILED: tools/torch-mlir/generated_backend.hash tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/RegisterLazy.cpp tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/shape_inference.cpp D:/packages/torch-mlir/build/tools/torch-mlir/generated_backend.hash D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/RegisterLazy.cpp D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/shape_inference.cpp
cmd.exe /C "cd /D D:\packages\torch-mlir\build\tools\torch-mlir\python\torch_mlir\csrc\base_lazy_backend && C:\Users\ryuta\AppData\Local\Programs\Python\Python39\python.exe D:/packages/torch-mlir/torch-mlir/build_tools/autogen_ltc_backend.py -b D:/packages/torch-mlir/build/tools/torch-mlir"
'grep' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 537, in <module>
    main(args)
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 498, in main
    generator()
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 481, in __call__
    self.generate_native_functions()
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 230, in generate_native_functions
    for op in subprocess.check_output(
  File "C:\Users\ryuta\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Users\ryuta\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['grep', '-o', 'aten::[0-9a-zA-Z_\\.]\\+', 'D:\\packages\\torch-mlir\\torch-mlir\\include\\torch-mlir\\Dialect\\Torch\\IR\\GeneratedTorchOps.td']' returned non-zero exit status 1.

https://github.com/llvm/torch-mlir/blob/c0630da678ea015ca8757340633c654d5539adac/build_tools/autogen_ltc_backend.py#L215-L218

It is possible to define the command similar to grep on both powershell.exe and cmd.exe:

        if psutil.Process(os.getpid()).name() == "powershell.exe":
            cmd = ["powershell", "-Command", "& {"
                + f"(Get-Content {self.torch_ops_file} | Select-String -Pattern \"aten::[0-9a-zA-Z_\.]+\").Matches.Value"
                + "}"]
        else:
            cmd = f"for /f \"tokens=7 delims=` \" %a in ('findstr /R /C:\"aten::\" {self.torch_ops_file}') do @echo %a"

3. yaml.scanner.ScannerError

This one was a bit confusing at first glance.

[104/142] Generating Lazy Tensor Core IR Nodes
FAILED: tools/torch-mlir/generated_backend.hash tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/RegisterLazy.cpp tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/shape_inference.cpp D:/packages/torch-mlir/build/tools/torch-mlir/generated_backend.hash D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/RegisterLazy.cpp D:/packages/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/shape_inference.cpp
cmd.exe /C "cd /D D:\packages\torch-mlir\build\tools\torch-mlir\python\torch_mlir\csrc\base_lazy_backend && C:\Users\ryuta\AppData\Local\Programs\Python\Python39\python.exe D:/packages/torch-mlir/torch-mlir/build_tools/autogen_ltc_backend.py -b D:/packages/torch-mlir/build/tools/torch-mlir"
Traceback (most recent call last):
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 536, in <module>
    main(args)
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 497, in main
    generator()
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 481, in __call__
    self.generate_shape_inference()
  File "D:\packages\torch-mlir\torch-mlir\build_tools\autogen_ltc_backend.py", line 341, in generate_shape_inference
    parsed_backend_yaml = parse_backend_yaml(
  File "d:\packages\pytorch\pytorch\torchgen\gen_backend_stubs.py", line 58, in parse_backend_yaml
    yaml_values = yaml.load(f, Loader=YamlLoader)
  File "C:\Users\ryuta\AppData\Local\Programs\Python\Python39\lib\site-packages\yaml\__init__.py", line 81, in load
    return loader.get_single_data()
  File "C:\Users\ryuta\AppData\Local\Programs\Python\Python39\lib\site-packages\yaml\constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "yaml\_yaml.pyx", line 673, in yaml._yaml.CParser.get_single_node
  File "yaml\_yaml.pyx", line 687, in yaml._yaml.CParser._compose_document
  File "yaml\_yaml.pyx", line 731, in yaml._yaml.CParser._compose_node
  File "yaml\_yaml.pyx", line 845, in yaml._yaml.CParser._compose_mapping_node
  File "yaml\_yaml.pyx", line 729, in yaml._yaml.CParser._compose_node
  File "yaml\_yaml.pyx", line 808, in yaml._yaml.CParser._compose_sequence_node
  File "yaml\_yaml.pyx", line 860, in yaml._yaml.CParser._parse_next_event
yaml.scanner.ScannerError: while scanning a simple key
  in "D:\packages\torch-mlir\build\tools\torch-mlir\generated_native_functions.yaml", line 41, column 1
could not find expected ':'
  in "D:\packages\torch-mlir\build\tools\torch-mlir\generated_native_functions.yaml", line 42, column 1

It turns out os.linesep on Windows is \r\n

https://github.com/llvm/torch-mlir/blob/c0630da678ea015ca8757340633c654d5539adac/build_tools/autogen_ltc_backend.py#L221-L227

and split(os.linesep) fails to split the string returned by the command which is delimited by \n. Removing os.linesep fixes the problem.

oroppas avatar Aug 29 '22 05:08 oroppas

Please feel free to submit PRs for these. Thanks for getting the Windows build going.

powderluv avatar Aug 29 '22 06:08 powderluv

I unfortunately do not have any windows machines to help debug or test this. But like @powderluv said, please feel free to submit a PR with the fixes and I'll be sure to review it

antoniojkim avatar Aug 29 '22 15:08 antoniojkim