torch-mlir icon indicating copy to clipboard operation
torch-mlir copied to clipboard

Add a build/test workflow for Windows

Open ScottTodd opened this issue 11 months ago • 2 comments

We've seen several downstream build breaks from torch-mlir due to missing test coverage. Having at least a nightly CI build using GitHub-hosted runners would provide earlier signal for build issues.

Existing workflows

  • https://github.com/llvm/torch-mlir/blob/main/.github/workflows/ci.yml only runs on Linux
  • https://github.com/llvm/torch-mlir/blob/main/.github/workflows/buildRelease.yml runs on multiple platforms but was last triggered 1 year ago (https://github.com/llvm/torch-mlir/actions/workflows/buildRelease.yml)

Expanding to new platforms

I see scripts used by ci.yml that could be forked or generalized:

  • https://github.com/llvm/torch-mlir/blob/main/build_tools/ci/build_posix.sh
  • https://github.com/llvm/torch-mlir/blob/main/build_tools/ci/test_posix.sh

A workflow could also be added that runs commands directly. I've been moving https://github.com/iree-org/iree (a downstream user of torch-mlir) away from such scripts, instead opting to make the build system work better out of the box using default options, with things like ccache and the choice of compiler (e.g. gcc/clang) delegated to the user's choice or a github action that configures environment variables.

ScottTodd avatar Jan 27 '25 21:01 ScottTodd

I gave this a shot by reusing https://github.com/llvm/torch-mlir/blob/main/build_tools/python_deploy/build_windows_ci.sh which is used for building the windows wheels in torch-mlir-release repo. The build worked fine but testing fails because signal.SIGALRM (used for timeout handling -- not much familiar with this piece of testing code) is not available for Windows. That's a different problem to solve to enable testing on Windows even outside of the CI workflow, so will create a separate issue for that. However, is there a benefit to still add a build only CI for now?

sahas3 avatar Feb 19 '25 13:02 sahas3

However, is there a benefit to still add a build only CI for now?

I would say yes. I would expect most platform differences to come into play at build time, not test time. Could also run a subset of tests (e.g. unit tests, but not integration tests).


As for that failure due to https://github.com/llvm/torch-mlir/blob/a265d283357f572c5437f9217ea85fa9770d374c/projects/pt1/python/torch_mlir_e2e_test/framework.py#L310-L323

I'd suggest (to whoever looks into it) checking out https://pypi.org/project/pytest-timeout/. Ideally just use that plugin with pytest and not implement something custom. Short of that, see the notes about timeout methods and portability.

ScottTodd avatar Feb 19 '25 16:02 ScottTodd