FedTree icon indicating copy to clipboard operation
FedTree copied to clipboard

not working with grpc-1.53.0 & server still waiting after finish training

Open lidh15 opened this issue 2 years ago • 8 comments

the documentation mentioned that grpc earlier than 1.50 may not work, I used the latest release 1.53, and making throws error:

[ 20%] Building CXX object src/FedTree/CMakeFiles/FedTree_DIST.dir/scikit_fedtree.cpp.o
In file included from /usr/local/include/absl/base/config.h:86,
                 from /usr/local/include/absl/base/const_init.h:25,
                 from /usr/local/include/absl/synchronization/mutex.h:67,
                 from /usr/local/include/grpcpp/impl/sync.h:30,
                 from /usr/local/include/grpcpp/impl/codegen/sync.h:25,
                 from /usr/local/include/grpcpp/completion_queue.h:43,
                 from /usr/local/include/grpcpp/channel.h:25,
                 from /usr/local/include/grpcpp/grpcpp.h:52,
                 from /workspace/FedTree/include/FedTree/FL/distributed_party.h:8,
                 from /workspace/FedTree/src/FedTree/FL/distributed_party.cpp:5:
/usr/local/include/absl/base/policy_checks.h:79:2: error: #error "C++ versions less than C++14 are not supported."
   79 | #error "C++ versions less than C++14 are not supported."
      |  ^~~~~
In file included from /usr/local/include/absl/base/config.h:86,
                 from /usr/local/include/absl/base/const_init.h:25,
                 from /usr/local/include/absl/synchronization/mutex.h:67,
                 from /usr/local/include/grpcpp/impl/sync.h:30,
                 from /usr/local/include/grpcpp/impl/codegen/sync.h:25,
                 from /usr/local/include/grpcpp/completion_queue.h:43,
                 from /usr/local/include/grpcpp/channel.h:25,
                 from /usr/local/include/grpcpp/grpcpp.h:52,
                 from /workspace/FedTree/include/FedTree/FL/distributed_server.h:8,
                 from /workspace/FedTree/src/FedTree/FL/distributed_server.cpp:5:
/usr/local/include/absl/base/policy_checks.h:79:2: error: #error "C++ versions less than C++14 are not supported."
   79 | #error "C++ versions less than C++14 are not supported."
      |  ^~~~~
In file included from /usr/local/include/absl/time/time.h:88,
                 from /usr/local/include/absl/time/clock.h:26,
                 from /usr/local/include/absl/synchronization/internal/kernel_timeout.h:35,
                 from /usr/local/include/absl/synchronization/mutex.h:74,
                 from /usr/local/include/grpcpp/impl/sync.h:30,
                 from /usr/local/include/grpcpp/impl/codegen/sync.h:25,
                 from /usr/local/include/grpcpp/completion_queue.h:43,
                 from /usr/local/include/grpcpp/channel.h:25,
                 from /usr/local/include/grpcpp/grpcpp.h:52,
                 from /workspace/FedTree/include/FedTree/FL/distributed_party.h:8,
                 from /workspace/FedTree/src/FedTree/FL/distributed_party.cpp:5:
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::remove_prefix(absl::lts_20230125::string_view::size_type) const’:
/usr/local/include/absl/strings/string_view.h:340:10: error: assignment of member ‘absl::lts_20230125::string_view::ptr_’ in read-only object
  340 |     ptr_ += n;
      |     ~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:341:13: error: assignment of member ‘absl::lts_20230125::string_view::length_’ in read-only object
  341 |     length_ -= n;
      |     ~~~~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:338:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::remove_prefix(absl::lts_20230125::string_view::size_type) const’
  338 |   constexpr void remove_prefix(size_type n) {
      |                  ^~~~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::remove_suffix(absl::lts_20230125::string_view::size_type) const’:
/usr/local/include/absl/strings/string_view.h:350:13: error: assignment of member ‘absl::lts_20230125::string_view::length_’ in read-only object
  350 |     length_ -= n;
      |     ~~~~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:348:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::remove_suffix(absl::lts_20230125::string_view::size_type) const’
  348 |   constexpr void remove_suffix(size_type n) {
      |                  ^~~~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::swap(absl::lts_20230125::string_view&) const’:
/usr/local/include/absl/strings/string_view.h:358:13: error: passing ‘const absl::lts_20230125::string_view’ as ‘this’ argument discards qualifiers [-fpermissive]
  358 |     *this = s;
      |             ^
/usr/local/include/absl/strings/string_view.h:161:7: note:   in call to ‘absl::lts_20230125::string_view& absl::lts_20230125::string_view::operator=(const absl::lts_20230125::string_view&)’
  161 | class string_view {
      |       ^~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h:356:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::swap(absl::lts_20230125::string_view&) const’
  356 |   constexpr void swap(string_view& s) noexcept {
      |                  ^~~~
In file included from /usr/local/include/absl/time/time.h:88,
                 from /usr/local/include/absl/time/clock.h:26,
                 from /usr/local/include/absl/synchronization/internal/kernel_timeout.h:35,
                 from /usr/local/include/absl/synchronization/mutex.h:74,
                 from /usr/local/include/grpcpp/impl/sync.h:30,
                 from /usr/local/include/grpcpp/impl/codegen/sync.h:25,
                 from /usr/local/include/grpcpp/completion_queue.h:43,
                 from /usr/local/include/grpcpp/channel.h:25,
                 from /usr/local/include/grpcpp/grpcpp.h:52,
                 from /workspace/FedTree/include/FedTree/FL/distributed_server.h:8,
                 from /workspace/FedTree/src/FedTree/FL/distributed_server.cpp:5:
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::remove_prefix(absl::lts_20230125::string_view::size_type) const’:
/usr/local/include/absl/strings/string_view.h:340:10: error: assignment of member ‘absl::lts_20230125::string_view::ptr_’ in read-only object
  340 |     ptr_ += n;
      |     ~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:341:13: error: assignment of member ‘absl::lts_20230125::string_view::length_’ in read-only object
  341 |     length_ -= n;
      |     ~~~~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:338:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::remove_prefix(absl::lts_20230125::string_view::size_type) const’
  338 |   constexpr void remove_prefix(size_type n) {
      |                  ^~~~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::remove_suffix(absl::lts_20230125::string_view::size_type) const’:
/usr/local/include/absl/strings/string_view.h:350:13: error: assignment of member ‘absl::lts_20230125::string_view::length_’ in read-only object
  350 |     length_ -= n;
      |     ~~~~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:348:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::remove_suffix(absl::lts_20230125::string_view::size_type) const’
  348 |   constexpr void remove_suffix(size_type n) {
      |                  ^~~~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::swap(absl::lts_20230125::string_view&) const’:
/usr/local/include/absl/strings/string_view.h:358:13: error: passing ‘const absl::lts_20230125::string_view’ as ‘this’ argument discards qualifiers [-fpermissive]
  358 |     *this = s;
      |             ^
/usr/local/include/absl/strings/string_view.h:161:7: note:   in call to ‘absl::lts_20230125::string_view& absl::lts_20230125::string_view::operator=(const absl::lts_20230125::string_view&)’
  161 | class string_view {
      |       ^~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h:356:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::swap(absl::lts_20230125::string_view&) const’
  356 |   constexpr void swap(string_view& s) noexcept {
      |                  ^~~~
[ 21%] Linking CXX shared library ../../lib/libFedTree.so
make[2]: *** [src/FedTree/CMakeFiles/FedTree_DIST.dir/build.make:146: src/FedTree/CMakeFiles/FedTree_DIST.dir/FL/distributed_party.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/usr/bin/ld: /usr/local/lib/libntl.a(ZZ.o): relocation R_X86_64_TPOFF32 against `_ZN3NTLL8iodigitsE' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(fileio.o): relocation R_X86_64_TPOFF32 against `_ZZN3NTL8UniqueIDB5cxx11EvE37_ntl_hidden_variable_tls_local_ptr_ID' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(lip.o): relocation R_X86_64_TPOFF32 against `_ZZ10_ntl_gswapPP17_ntl_gbigint_bodyS1_E36_ntl_hidden_variable_tls_local_ptr_t' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(tools.o): relocation R_X86_64_TPOFF32 against symbol `_ZN3NTL16ErrorMsgCallbackE' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(thread.o): relocation R_X86_64_TPOFF32 against `_ZZN3NTL15CurrentThreadIDB5cxx11EvE37_ntl_hidden_variable_tls_local_ptr_ID' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(BasicThreadPool.o): relocation R_X86_64_TPOFF32 against `_ZZN3NTLL49_ntl_hidden_function_tls_access_NTLThreadPool_stgEvE52_ntl_hidden_variable_tls_local_ptr_NTLThreadPool_stg' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(lip.o): warning: relocation against `_ZTV21_ntl_tmp_vec_crt_fast' in read-only section `.text'
collect2: error: ld returned 1 exit status
make[2]: *** [src/FedTree/CMakeFiles/FedTree.dir/build.make:551: lib/libFedTree.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:154: src/FedTree/CMakeFiles/FedTree.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make[2]: *** [src/FedTree/CMakeFiles/FedTree_DIST.dir/build.make:160: src/FedTree/CMakeFiles/FedTree_DIST.dir/FL/distributed_server.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:232: src/FedTree/CMakeFiles/FedTree_DIST.dir/all] Error 2
[ 23%] Linking CXX static library ../../lib/libft_grpc_proto.a
[ 24%] Built target ft_grpc_proto
make: *** [Makefile:91: all] Error 2
seems that it came from the latest absl.

lidh15 avatar Mar 30 '23 06:03 lidh15

okay, it's not about absl, update CMakeLists.txt from c++11 to c++14 fixed it, but it is about zliib, the errors are:

/usr/bin/ld: /usr/local/lib/libgrpc.a(message_compress.cc.o): in function `zlib_compress(grpc_slice_buffer*, grpc_slice_buffer*, int)':
message_compress.cc:(.text+0x541): undefined reference to `deflateInit2_'
/usr/bin/ld: message_compress.cc:(.text+0x58b): undefined reference to `deflate'
/usr/bin/ld: message_compress.cc:(.text+0x660): undefined reference to `deflateEnd'
/usr/bin/ld: /usr/local/lib/libgrpc.a(message_compress.cc.o): in function `zlib_decompress(grpc_slice_buffer*, grpc_slice_buffer*, int)':
message_compress.cc:(.text+0x701): undefined reference to `inflateInit2_'
/usr/bin/ld: message_compress.cc:(.text+0x747): undefined reference to `inflate'
/usr/bin/ld: message_compress.cc:(.text+0x7ee): undefined reference to `inflateEnd'
collect2: error: ld returned 1 exit status
make[2]: *** [src/FedTree/CMakeFiles/FedTree-distributed-party.dir/build.make:164: bin/FedTree-distributed-party] Error 1
make[1]: *** [CMakeFiles/Makefile2:259: src/FedTree/CMakeFiles/FedTree-distributed-party.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: /usr/local/lib/libgrpc.a(message_compress.cc.o): in function `zlib_compress(grpc_slice_buffer*, grpc_slice_buffer*, int)':
message_compress.cc:(.text+0x541): undefined reference to `deflateInit2_'
/usr/bin/ld: message_compress.cc:(.text+0x58b): undefined reference to `deflate'
/usr/bin/ld: message_compress.cc:(.text+0x660): undefined reference to `deflateEnd'
/usr/bin/ld: /usr/local/lib/libgrpc.a(message_compress.cc.o): in function `zlib_decompress(grpc_slice_buffer*, grpc_slice_buffer*, int)':
message_compress.cc:(.text+0x701): undefined reference to `inflateInit2_'
/usr/bin/ld: message_compress.cc:(.text+0x747): undefined reference to `inflate'
/usr/bin/ld: message_compress.cc:(.text+0x7ee): undefined reference to `inflateEnd'
collect2: error: ld returned 1 exit status
make[2]: *** [src/FedTree/CMakeFiles/FedTree-distributed-server.dir/build.make:164: bin/FedTree-distributed-server] Error 1
make[1]: *** [CMakeFiles/Makefile2:286: src/FedTree/CMakeFiles/FedTree-distributed-server.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

lidh15 avatar Mar 30 '23 10:03 lidh15

Hi @lidh15 ,

We use grpc 1.50.0 to generate the proto files. If you use a version other than 1.50.0, you may need to go to src/FedTree/grpc directory and run the following commands. Then you can try to compile the library. Thank you!

protoc -I ./ --grpc_out=. --plugin=protoc-gen-grpc=`which grpc_cpp_plugin` ./fedtree.proto
protoc -I ./ --cpp_out=. ./fedtree.proto

QinbinLi avatar Mar 30 '23 16:03 QinbinLi

okay, I'll try.

lidh15 avatar Mar 30 '23 16:03 lidh15

I don't know if it is okay to discuss in this issue or I should start a new one: why the distributed server won't exit after a vertical gbdt training process? I know in original horizontal federated learning architecture it is believed to be a service, but in vertical scenarios "server" usually is also a "party" but only with label, will it be possible that "distributed-party" takes server's job and exit after a training task?

lidh15 avatar Mar 30 '23 17:03 lidh15

Thank you for this great suggestion! Indeed it'd be better if the server stops automatically when the task is over. We'll fix it in the future.

QinbinLi avatar Mar 30 '23 23:03 QinbinLi

Hi @lidh15 ,

We use grpc 1.50.0 to generate the proto files. If you use a version other than 1.50.0, you may need to go to src/FedTree/grpc directory and run the following commands. Then you can try to compile the library. Thank you!

protoc -I ./ --grpc_out=. --plugin=protoc-gen-grpc=`which grpc_cpp_plugin` ./fedtree.proto
protoc -I ./ --cpp_out=. ./fedtree.proto

this didn't help

lidh15 avatar Mar 31 '23 03:03 lidh15

and one more question, how many bits of N is used in paillier HE for vertical GBDT? Typically it is 2048, but I didn't see this description in the documentation.

lidh15 avatar Mar 31 '23 06:03 lidh15

512 bits are used in the default setting. I just added the parameter key_length so that users can control the bits. Please refer to https://fedtree.readthedocs.io/en/latest/Parameters.html for details.

For grpc 1.53.0, I have no idea why it fails. I'm considering adding a feature to automatically install a fixed version of grpc when compiling FedTree to avoid the grpc compatibility issue.

QinbinLi avatar Apr 01 '23 01:04 QinbinLi