Out of tree build in conda environment / overdependency problem
Hello there I'm trying to build out-of-tree torch-mlir in conda environment, but failing to do so because of numerous problems.
I had to build locally mlir-python-binding to have MLIRPythonExtension* and MLIRPythonSources* targets populated (https://github.com/conda-forge/mlir-python-bindings-feedstock/issues/19), but then I'm getting this kind of error:
CMake Error at /home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:674 (target_link_libraries):
Cannot find source file:
/home/yevhenii/.miniconda3/envs/mlir-dev/src/python/MLIRPythonExtension.Core/MainModule.cpp
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm
.ccm .cxxm .c++m .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90
.f95 .f03 .hip .ispc
Call Stack (most recent call first):
/home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:228 (add_mlir_python_extension)
/home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:248 (_process_target)
python/CMakeLists.txt:112 (add_mlir_python_modules)
CMake Error at /home/yevhenii/.miniconda3/envs/mlir-dev/share/cmake-3.29/Modules/FindPython/Support.cmake:4078 (add_library):
No SOURCES given to target: TorchMLIRPythonModules.extension._mlir.dso
Call Stack (most recent call first):
/home/yevhenii/.miniconda3/envs/mlir-dev/share/cmake-3.29/Modules/FindPython3.cmake:549 (__Python3_add_library)
/home/yevhenii/.miniconda3/envs/mlir-dev/lib/python3.12/site-packages/pybind11/share/cmake/pybind11/pybind11NewTools.cmake:239 (python3_add_library)
/home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:642 (pybind11_add_module)
/home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:228 (add_mlir_python_extension)
/home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:248 (_process_target)
python/CMakeLists.txt:112 (add_mlir_python_modules)
It looks pretty wired for me, cause for some reason out-of-tree build reference to the source (not header) files of mlir project. Is it problem with overdependency on torch-mlir side, or it is "by-design". If it is by design, could you provide detailed explanation what source file is for in the downstream project (I'm relatively new to llvm)?
I have no idea what the mlir-python-bindings-feedstock in conda is. I wrote much of those Python bindings, and afaik, the MLIR project never converged on releasing them as a real package or anything ("MLIR" is not an entity that means much outside its role as a library to build certain kinds of compilers). Those bindings were written to be used by other projects wishing to create a Python API for their compilers: MLIR compilers are monolithic and generally don't interoperate across binary boundaries (not saying I care for this state, just reporting the news). So torch-mlir is building a standalone Python API under its namespace, using sources from upstream to do so.
The CMake standalone example includes a minimal, standalone Python API setup: https://github.com/llvm/llvm-project/tree/main/mlir/examples/standalone
iirc, there is CMake magic that gives the source sets names, but basically it is all of the files under mlir/lib/Bindings/Python. Those get built into various pybind extensions that form the API. They all depend on the mlir-c headers (only) and link against something that exports those symbols.
It's been a long time since I've looked at any of that, but I think the install rules here (https://github.com/llvm/llvm-project/blob/main/mlir/cmake/modules/AddMLIRPython.cmake#L174) are what is making sure that the sources and corresponding targets are exported at install time (which I would presume is what conda is doing, but I don't really know). If working from an installed LLVM, it would need to install/export those sources and targets. I don't know of anyone else trying to use torch-mlir standalone in this way (a lot of the developers use torch-mlir as a C++ library dep), but I am aware of other projects that consume MLIR's Python API sources in this way.
(I should also say that all of this is open to be improved. The project contributors to these things have not invested here and largely use torch-mlir as a C++ library, as mentioned. But I am certainly not opposed to folks who care about other ways of using it investing to make it work)
Thank you @stellaraccident ! I've resolve issue with missing sources - it was because feedstock build script was removing /src folder. I was able to successfully generate cmake configuration inside conda environment with pre-build out of tree llvm-project. Currently experiencing some issues both with and without stablehlo build:
Without stablehlo TorchConversionDialect.h.inc is not generated while Building CXX object lib/RefBackend/CMakeFiles/obj.TorchMLIRRefBackend.dir/RefBackend.cpp.o
With stablehlo, there is a problem with build of stablehlo related to llvm install or whatever, but TorchConversionDialect.h.inc was generated.
My question is it possible that there are missing dependencies set between TorchMLIRRefBackend/RefBackend.cpp and TorchConversionDialect.h.inc in cmake config?
My question is it possible that there are missing dependencies set between
TorchMLIRRefBackend/RefBackend.cppandTorchConversionDialect.h.incin cmake config?
Quite possible. I believe ninja now has a mode where it can check for some things like this, but I don't think anyone has spent time sanitizing the different build modes of this project.
This should be fixed now, but I'm waiting for you to test it before I catch up the mlir-python-bindings feedstock (that I wasn't aware of until a few days ago) to the most current version.