horovod
horovod copied to clipboard
Building wheel for horovod (setup.py) ... error
Environment:
-
Framework: TensorFlow
-
Framework version: 2.12.0
-
Horovod version: horovod-0.27.0
-
MPI version: openmpi-4.1.5
-
CUDA version: 11.8
-
NCCL version:
-
Python version: 3.9
-
Spark / PySpark version:
-
Ray version:
-
OS and version: WSL-Ubuntu 18
-
GCC version:
-
CMake version:
Checklist:
- Did you search issues to find if somebody asked this question before?
- If your question is about hang, did you read this doc?
- If your question is about docker, did you read this doc?
- Did you check if you question is answered in the troubleshooting guide?
Bug report:
Please describe erroneous behavior you're observing and steps to reproduce it.
When I excute the code "pip install horovod" or "HOROVOD_GPU_OPERATIONS=NCCL pip install horovod", there is an error as follow:
@JJDawn Did you solve it yet?
@JJDawn Having the same issue here... 😞
It looks like a problem with gloo FTBFS
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
cd /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/third_party/gloo/gloo && /usr/bin/c++ -DEIGEN_MPL2_ONLY=1 -D_GLIBCXX_USE_CXX11_ABI=1 -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/HTTPRequest/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/assert/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/config/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/core/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/detail/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/iterator/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/lockfree/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/mpl/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/parameter/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/predef/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/preprocessor/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/static_assert/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/type_traits/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/boost/utility/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/lbfgs/include -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo -I/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/third_party/gloo -pthread -fPIC -Wall -ftree-vectorize -mf16c -mavx -mfma -std=c++11 -fPIC -O3 -g -DNDEBUG -std=c++14 -o CMakeFiles/gloo.dir/allreduce_local.cc.o -c /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc
In file included from /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/transport/pair.h:13,
from /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/context.h:15,
from /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allgather.h:11,
from /tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/compatible_gloo/gloo/allgather.cc:9:
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/common/logging.h: In instantiation of ‘gloo::enforce_detail::EnforceFailMessage gloo::enforce_detail::Equals(const T1&, const T2&) [with T1 = long unsigned int; T2 = int]’:
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/compatible_gloo/gloo/allgather.cc:41:5: required from here
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/common/logging.h:124:28: warning: comparison of integer expressions of different signedness: ‘const long unsigned int’ and ‘const int’ [-Wsign-compare]
119 | if (x op y) { \
| ~~~~~~
......
124 | BINARY_COMP_HELPER(Equals, ==)
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/common/logging.h:119:11: note: in definition of macro ‘BINARY_COMP_HELPER’
119 | if (x op y) { \
| ^~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = signed char]’:
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:43:1: required from here
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<signed char*, std::allocator<signed char*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
31 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:35:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<signed char*, std::allocator<signed char*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
35 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = unsigned char]’:
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:44:1: required from here
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<unsigned char*, std::allocator<unsigned char*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
31 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:35:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<unsigned char*, std::allocator<unsigned char*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
35 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = int]’:
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:45:1: required from here
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<int*, std::allocator<int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
31 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:35:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<int*, std::allocator<int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
35 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = long int]’:
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:46:1: required from here
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<long int*, std::allocator<long int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
31 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:35:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<long int*, std::allocator<long int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
35 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc: In instantiation of ‘void gloo::AllreduceLocal<T>::run() [with T = long unsigned int]’:
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:47:1: required from here
/tmp/pip-install-ail2zx31/horovod_a67f3e8018584c5bb237dd38a65ab5db/third_party/gloo/gloo/allreduce_local.cc:31:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<long unsigned int*, std::allocator<long unsigned int*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
31 | for (int i = 1; i < ptrs_.size(); i++) {
| ~~^~~~~~~~~~~~~~