rmm icon indicating copy to clipboard operation
rmm copied to clipboard

WIP: Prevent path conflict in builds

Open AyodeAwe opened this issue 1 year ago • 9 comments
trafficstars

Fixes #1528.

Contributes to https://github.com/rapidsai/build-planning/issues/54 and https://github.com/rapidsai/build-planning/issues/56.

Related to https://github.com/rapidsai/rapids-cmake/pull/592

Notes for Reviewers

This is not ready for review yet.

Related conversations:

  • #1177

AyodeAwe avatar Mar 25 '24 19:03 AyodeAwe

This is currently failing with the following conflicts.

(I've included just 1 example of each type below)

(CUDA 11.8 build) (CUDA 12.2 build)

  1. fmt headers in include/fmt (conflicting packages: conda-forge/fmt, librmm)
This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-aarch64::fmt-10.2.1-h2a328a1_0, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
  path: 'include/fmt/chrono.h'
  1. fmt builds scripts in lib/cmake/fmt/ *(conflicting packages: conda-forge/fmt, librmm`)*
This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-aarch64::fmt-10.2.1-h2a328a1_0, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
  path: 'lib/cmake/fmt/fmt-targets.cmake'
  1. fmt pkgconfig script (conflicting packages: conda-forge/fmt, librmm)
This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-aarch64::fmt-10.2.1-h2a328a1_0, file:///tmp/conda-bld-output/linux-aarch64::librmm-
  path: 'lib/pkgconfig/fmt.pc'
  1. spdlog headers (conflicting packages: conda-forge/fmt, librmm)
This transaction has incompatible packages due to a shared path.
   packages: conda-forge/linux-aarch64::spdlog-1.12.0-h6b8df57_2, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
   path: 'include/spdlog/async.h'
  1. spdlog build scripts (conflicting packages: conda-forge/fmt, librmm)
This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-aarch64::spdlog-1.12.0-h6b8df57_2, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
  path: 'lib/cmake/spdlog/spdlogConfig.cmake'
  1. spdlog pkgconfig script (conflicting packages: conda-forge/fmt, librmm)
This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-aarch64::spdlog-1.12.0-h6b8df57_2, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
  path: 'lib/pkgconfig/spdlog.pc'

jameslamb avatar Apr 19 '24 21:04 jameslamb

I can now see this clobbering blocking RMM PRs such as #1537. Can't build RMM C++ in CI due to path conflicts for spdlog and fmt. e.g.

ClobberWarning: This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-64::fmt-10.2.1-h00ab1b0_0, file:///tmp/conda-bld-output/linux-64::librmm-24.06.00a20-cuda11_240423_ga4d6c965_20
  path: 'include/fmt/args.h'

harrism avatar Apr 23 '24 20:04 harrism

@harrism I don't think the clobbering stuff is what's causing that PR to fail (although it is generating thousands of lines of scary-looking logs 😅 ).

#1537 is failing because it removes a file but not the corresponding test in the conda recipe.

+ test -f /opt/conda/conda-bld/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/include/rmm/thrust_rmm_allocator.h
WARNING: Tests failed for librmm-24.06.00a20-cuda12_240423_ga4d6c965_20.tar.bz2 - moving package to /opt/conda/conda-bld/broken

Remove this test there:

https://github.com/rapidsai/rmm/blob/9e6db746f1a4a6361fb9fadf381f749dc52faaea/conda/recipes/librmm/meta.yaml#L84

jameslamb avatar Apr 23 '24 20:04 jameslamb

Oh how did you even find that? I just saw all the ClobberWarnings.

harrism avatar Apr 23 '24 20:04 harrism

Any way to make those failures fail with the text "error:"? This is usually what I search for in the logs.

harrism avatar Apr 23 '24 20:04 harrism

Oh how did you even find that? I just saw all the ClobberWarnings.

I went straight to the end of the log and read back up from there until I saw something problematic. A lot of the CI scripts across RAPIDS have set -e -u -o pipefail set, so they tend to fail at the first place where something goes wrong.

I also had just looked at these tests in the conda recipe today, in the process of testing for this PR, so had some pattern recognition for what it looked like when they failed.

Any way to make those failures fail with the text "error:"? This is usually what I search for in the logs.

Not that I'm aware of. That comes from within conda itself, I don't think we can control it.

It does say "fail" which I usually search for along with "error".

...
WARNING: Tests failed for librmm-24.06.00a20-cuda11_240423_ga4d6c965_20.tar.bz2 - moving package to /opt/conda/conda-bld/broken
...
TESTS FAILED: librmm-24.06.00a20-cuda11_240423_ga4d6c965_20.tar.bz2
[rapids-conda-retry] conda returned exit code: 1

(build link)

jameslamb avatar Apr 23 '24 20:04 jameslamb

Will this make it into 24.06?

harrism avatar May 09 '24 01:05 harrism

Will this make it into 24.06?

short answer

Not unless we decide that there's an urgent need for it.

long answer

The root cause of these fmt and spdlog clobbering issues across RAPIDS is "RAPIDS is carrying around patches to those libraries, so rapids-cmake always downloads them, and it places them at likely-to-cause-conflicts paths like include/fmt".

I'd started pursuing a short-term fix (upgrade to newer versions of fmt and spdlog that don't need the patches), described in https://github.com/rapidsai/build-planning/issues/56 and tested over in #1544.

Stopped short of trying to roll that out across all of RAPIDS conda packages, because doing it might lead to RAPIDS packages conflicting with conda itself and other packages from conda-forge. @bdice summarized that well here: https://github.com/rapidsai/build-planning/issues/56#issuecomment-2087365946

At that point, we paused on this to work towards other packaging priorities for this release: https://github.com/rapidsai/build-planning/issues/54#issuecomment-2093814987

I'd like to pick up a more permanent solution (RAPIDS redistributing these things when necessary, via its own conda package built from rapids-cmake) in the next release cycle.

cc @mmccarty for visibility

jameslamb avatar May 09 '24 21:05 jameslamb

Thanks. Moving to backlog.

harrism avatar May 22 '24 01:05 harrism

This work is paused, in favor of pursuing a better long-term solution in the future. Closing this PR for now.

Subscribe to https://github.com/rapidsai/build-planning/issues/54 and https://github.com/rapidsai/build-planning/issues/56 for updates.

jameslamb avatar Jul 18 '24 16:07 jameslamb