horovod icon indicating copy to clipboard operation
horovod copied to clipboard

Building wheel for horovod (setup.py) ... error

Open JJDawn opened this issue 1 year ago • 3 comments

Environment:

  1. Framework: TensorFlow

  2. Framework version: 2.12.0

  3. Horovod version: horovod-0.27.0

  4. MPI version: openmpi-4.1.5

  5. CUDA version: 11.8

  6. NCCL version: image

  7. Python version: 3.9

  8. Spark / PySpark version:

  9. Ray version:

  10. OS and version: WSL-Ubuntu 18

  11. GCC version:

  12. CMake version:

Checklist:

  1. Did you search issues to find if somebody asked this question before?
  2. If your question is about hang, did you read this doc?
  3. If your question is about docker, did you read this doc?
  4. Did you check if you question is answered in the troubleshooting guide?

Bug report: Please describe erroneous behavior you're observing and steps to reproduce it. When I excute the code "pip install horovod" or "HOROVOD_GPU_OPERATIONS=NCCL pip install horovod", there is an error as follow: image image

JJDawn avatar May 09 '23 05:05 JJDawn

@JJDawn Did you solve it yet?

AkshayRoyal avatar Jul 11 '23 11:07 AkshayRoyal

@JJDawn Having the same issue here... 😞

Calvinnncy97 avatar Sep 15 '23 02:09 Calvinnncy97

It looks like a problem with gloo FTBFS

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
      cd /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/third_party/gloo/gloo && /usr/bin/c++ -DEIGEN_MPL2_ONLY=1 -D_GLIBCXX_USE_CXX11_ABI=1 -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/HTTPRequest/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/assert/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/config/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/core/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/detail/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/iterator/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/lockfree/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/mpl/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/parameter/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/predef/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/preprocessor/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/static_assert/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/type_traits/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/utility/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/lbfgs/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/third_party/gloo -pthread -fPIC -Wall -ftree-vectorize -mf16c -mavx -mfma -std=c++11 -fPIC -O3 -g -DNDEBUG -std=c++14 -o CMakeFiles/gloo.dir/allreduce_local.cc.o -c /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc
      In file included from /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/transport/pair.h:13,
                       from /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/context.h:15,
                       from /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allgather.h:11,
                       from /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/compatible_gloo/gloo/allgather.cc:9:
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/common/logging.h: In instantiation of ‘gloo::enforce_detail::EnforceFailMessage gloo::enforce_detail::Equals(const T1&, const T2&) [with T1 = long unsigned int; T2 = int]’:
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/compatible_gloo/gloo/allgather.cc:41:5:   required from here
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/common/logging.h:124:28: warning: comparison of integer expressions of different signedness: ‘const long unsigned int’ and ‘const int’ [-Wsign-compare]
        119 |     if (x op y) {                                            \
            |         ~~~~~~
      ......
        124 | BINARY_COMP_HELPER(Equals, ==)
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/common/logging.h:119:11: note: in definition of macro ‘BINARY_COMP_HELPER’
        119 |     if (x op y) {                                            \
            |           ^~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = signed char]’:
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:43:1:   required from here
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<signed char*, std::allocator<signed char*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         31 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:35:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<signed char*, std::allocator<signed char*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         35 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = unsigned char]’:
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:44:1:   required from here
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<unsigned char*, std::allocator<unsigned char*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         31 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:35:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<unsigned char*, std::allocator<unsigned char*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         35 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = int]’:
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:45:1:   required from here
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<int*, std::allocator<int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         31 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:35:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<int*, std::allocator<int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         35 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = long int]’:
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:46:1:   required from here
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<long int*, std::allocator<long int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         31 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:35:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<long int*, std::allocator<long int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         35 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = long unsigned int]’:
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:47:1:   required from here
      /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<long unsigned int*, std::allocator<long unsigned int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
         31 |   for (int i = 1; i < ptrs_.size(); i++) {
            |                   ~~^~~~~~~~~~~~~~

cjac avatar Jul 09 '24 20:07 cjac