intel-extension-for-pytorch icon indicating copy to clipboard operation
intel-extension-for-pytorch copied to clipboard

Difficult CMake build system producing unexpected results with MKL >= 2021.3.0

Open gtkramer opened this issue 1 year ago • 5 comments

I am part of an internal team with Intel and we are using the C++ version of PyTorch for some of our work. We are interested in the CPU version of IPEX to ensure our inferencing is running as fast as possible on modern server CPUs.

We need to build IPEX 2.2.0+cpu from source, and as I began working with the build system, a few observations came to mind.

  1. It's not possible to call out to CMake directly to just build and install libintel-ext-pt-cpu.so. setup.py creates a version.h, which would ideally be done in CMake, which the code in the project needs to compile.
  2. Ideally we should let CMake fully detect Python and set all resulting variables on its own rather than passing PYTHON_EXECUTABLE and PYTHON_INCLUDE_DIR as a configuration option. We already do find_package(Python COMPONENTS Interpreter Development) in intel_extension_for_pytorch/csrc/CMakeLists.txt, which creates all of the variables needed. As long as the virtual environment is added to the CMAKE_PREFIX_PATH, this should work ok?
  3. Doing a pip install for MKL packages in CMake as a part of FindoneMKL.cmake is unexpected. This would be better put into setup.py as a requirement so that this is done automatically by pip. This should be fine because MKL is required to build the project. If not in a virtual environment, which typically wouldn't be needed for building C++ code, this can end up installing the MKL packages into unexpected locations.
  4. With versions of MKL >= 2021.3.0, first-party support for CMake was introduced. When building with these versions, from what I can tell, MKL is pulled in twice. Once as a part of FindoneMKL.cmake (statically with 32-bit integers and GNU threading), and once from FindOMP.cmake (dynamically with 64-bit integers and Intel threading). It took a great deal of time to figure out what was going on when I found two find_package calls that were MKL-related. One was find_package(oneMKL) in csrc/cpu/CMakeLists.txt and tests/cpu/cpp/CMakeLists.txt and the other was find_package(MKL) in FindOMP.cmake. MKL should ideally be pulled in once and in one way only. My builds were failing with the dynamic MKL import when the build logs reported that the static MKL import was successful.
  5. Please consider unifying CMake projects. I would expect when building the C++ side of things with CMake that a single project would also be capable of building tests without needing a separate CMake invocation to configure everything again (that setup.py does). The root CMakeLists.txt can only build csrc/cpu/CMakeLists.txt. It can't be used to build tests/cpu/cpp/CMakeList.txt, which was unexpected.
  6. Please consider naming the custom find module for OpenMKL (FindOMP.cmake) as FindOpenMP.cmake instead and prepend to the CMAKE_PREFIX_PATH to ensure this is chosen first. This allows IPEX to write standard CMake.
  7. Please consider using something like CTest to run testing on the C++ side. It has excellent integration with Google Test. Going along with suggestion 5 for project unification, tests can be conditionally enabled just by doing include(CTest) and then the user can set -DBUILD_TESTING=ON/OFF when configuring the project with CMake. Or, this can be an option that the project defines a default value for. Defining relative paths back to the root CMake to configure testing properly is not very easy to read or understand (or I imagine maintain).
  8. In cmake/IPEXConfig.cmake.in, please consider doing something like target_link_libraries(intel-ext-pt-cpu INTERFACE "-Wl,--push-state,--no-as-needed" "${IPEX_CPU_CORE_LIBRARY}" "-Wl,--pop-state"). This follows what MKL does. It's easier to read target_include_directories and target_link_libraries rather than one massive set_target_properties. Additionally, it's more mindful about preserving the state of the linker, being very clear that we're only applying --no-as-needed to the IPEX library. It also gives a way to have the intel-ext-pt-cpu target that is available for use in consuming CMake projects to also use the torch target that comes from TorchConfig.cmake. It can be added as target_link_library, like how TorchVision does it with target_link_libraries(${PN}::${PN} INTERFACE torch).

I'm guessing the complexity here arose from needing to use CMake to build the C++ project, but also needing to build a wheel in Python. I've found that building wheels in Python is best done with Python's own infrastructure. Bridging this gap with CMake is often difficult since there is no standard process for this, so I can understand how things are the way they are. At least for the C++ side of things, the changes listed above would make building this project MUCH easier for us if we could call out to CMake directly. I also ran into some AVX512 FP16 build issues due to invalid conversions from c10::Half to _Float16 (we are building with GCC 13.2 on modern enough Xeons), but I'll file a separate issue for that along with the patch I have for the fixes.

gtkramer avatar Feb 13 '24 17:02 gtkramer

Have you tried the IPEX CPPSDK? https://intel.github.io/intel-extension-for-pytorch/#installation?platform=cpu&version=v2.2.0%2Bcpu&os=linux%2Fwsl2&package=cppsdk

jingxu10 avatar Feb 15 '24 02:02 jingxu10

If you want to compile from source, you can use the command below: LIBTORCH_PATH=<libtorch_path> python setup.py bdist_cppsdk

jingxu10 avatar Feb 15 '24 03:02 jingxu10

To illustrate the difficulty I had here better, here is a snippet of the recipe I'm using to successfully build IPEX for Intel CPUs:

# Source MKL environment variables, which is normal usage for MKL
. $MKL_INSTALL_DIR/mkl/latest/env/vars.sh
export LDFLAGS="$LDFLAGS -Wl,-rpath,${LIBRARY_PATH%:}"

cd $IPEX_SRC_DIR

# Create missing sources first
IPEX_GIT_SHA="$(git rev-parse --short HEAD)"
IPEX_VERSION="$(awk '{print $2}' version.txt | tr '\n' '.' | sed 's/\.$//')"
TORCH_GIT_SHA="$(cat $PYTORCH_INSTALL_DIR/build-hash)"
BUILD_TYPE=Release

# Obtained from the create_version_files function in setup.py
# The C++ sources won't build without this
cat > csrc/utils/version.h <<EOF
#pragma once
#include <string>

namespace torch_ipex {

const std::string __version__()
{ return "${IPEX_VERSION}"; }

const std::string __gitrev__()
{ return "${IPEX_GIT_SHA}"; }

const std::string __torch_gitrev__()
{ return "${TORCH_GIT_SHA}"; }

const std::string __build_type__()
{ return "${BUILD_TYPE}"; }

}  // namespace torch_ipex
EOF

# Then apply patches
patch -p1 < $SRC_DIR/0001-Fix-AVX512-FP16-build-issues.patch
patch -p1 < $SRC_DIR/0002-Fix-IPEX-CMake-configuration.patch
patch -p1 < $SRC_DIR/0003-Remove-custom-find-module-for-OpenMP.patch
patch -p1 < $SRC_DIR/0004-Rely-on-MKLROOT-to-find-oneMKL.patch
patch -p1 < $SRC_DIR/0005-Make-oneMKL-link-type-configurable.patch
patch -p1 < $SRC_DIR/0006-Fix-oneMKL-dynamic-library-link-line.patch

mkdir build && cd $_

cmake .. \
-DCMAKE_BUILD_TYPE="${BUILD_TYPE}" \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX="$INSTALL_DIR" \
-DCMAKE_VERBOSE_MAKEFILE=ON \
-DBUILD_MODULE_TYPE=CPU \
-DIPEX_PROJ_NAME=intel_extension_for_pytorch \
-DCMAKE_PROJECT_VERSION="${IPEX_VERSION}" \
-DPYTHON_INCLUDE_DIR="$(python -c 'import sysconfig; print(sysconfig.get_paths()["include"])')" \
-DBUILD_STATIC_ONEMKL=OFF \

cmake --build . --parallel $(nproc)
cmake --install .

This is a lot. Ideally, I wouldn't need to modify the sources for IPEX to just build the C++ project, or do that build work through Python. And, ideally, I wouldn't need to define obscure CMake configuration variables like IPEX_PROJ_NAME, CMAKE_PROJECT_VERSION, and PYTHON_INCLUDE_DIR to get this to work.

Not building the CPPSDK reduces the build time here since we don't need to compress the end result into a package, and then re-extract it just to get it installed to the correct location. I am very glad there was a way to bypass this and reduce to overall build time.

gtkramer avatar Feb 17 '24 01:02 gtkramer

@xuhancn

jingxu10 avatar Feb 19 '24 07:02 jingxu10

I also ran into some AVX512 FP16 build issues due to invalid conversions from c10::Half to _Float16 (we are building with GCC 13.2 on modern enough Xeons), but I'll file a separate issue for that along with the patch I have for the fixes.

@gtkramer, not sure what happened with your pull-request, so I created a new one - https://github.com/intel/intel-extension-for-pytorch/pull/587

@xuhancn, I noticed you tried to fix build with Clang previously, could you check my pull-request? Thanks!

AngryLoki avatar Apr 09 '24 18:04 AngryLoki