decision-forests icon indicating copy to clipboard operation
decision-forests copied to clipboard

Bazel won't compile on 0.2.1 due to missing absl::Time include

Open AlessandroFlati opened this issue 3 years ago • 5 comments

Hi all, thanks for all the effort on this project. I'm here just to document some compilation errors on Linux Mint / Ubuntu 20.04, GCC 8, Python 3.8, JDK 11, Bazel 3.7.2, tag 0.2.1:

ERROR: /home/alessandro/.cache/bazel/_bazel_alessandro/12a6f981b14c4d716385955ae691d270/external/ydf/yggdrasil_decision_forests/learner/distributed_decision_tree/load_balancer/BUILD:19:15: C++ compilation of rule '@ydf//yggdrasil_decision_forests/learner/distributed_decision_tree/load_balancer:load_balancer' failed (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 43 argument(s) skipped)
In file included from external/ydf/yggdrasil_decision_forests/learner/distributed_decision_tree/load_balancer/load_balancer.cc:16:
external/ydf/yggdrasil_decision_forests/learner/distributed_decision_tree/load_balancer/load_balancer.h:270:9: error: 'Time' in namespace 'absl' does not name a type
   absl::Time time_last_balancing_;
         ^~~~
external/ydf/yggdrasil_decision_forests/learner/distributed_decision_tree/load_balancer/load_balancer.cc: In member function 'yggdrasil_decision_forests::utils::StatusOr<bool> yggdrasil_decision_forests::model::distributed_decision_tree::LoadBalancer::AddWorkDurationMeasurement(const std::vector<yggdrasil_decision_forests::model::distributed_decision_tree::LoadBalancer::Measure>&)':
external/ydf/yggdrasil_decision_forests/learner/distributed_decision_tree/load_balancer/load_balancer.cc:190:17: error: 'time_last_balancing_' was not declared in this scope
          (now - time_last_balancing_ >=
                 ^~~~~~~~~~~~~~~~~~~~
external/ydf/yggdrasil_decision_forests/learner/distributed_decision_tree/load_balancer/load_balancer.cc:190:17: note: suggested alternative: 'num_measures_last_balancing_'
          (now - time_last_balancing_ >=
                 ^~~~~~~~~~~~~~~~~~~~
                 num_measures_last_balancing_

A simple

#include "absl/time/time.h"

in external/ydf/yggdrasil_decision_forests/learner/distributed_decision_tree/load_balancer/load_balancer.h seems to solve that problem.

Another one arised:

external/ydf/yggdrasil_decision_forests/metric/metric.cc: In function 'void yggdrasil_decision_forests::metric::AddPrediction(const yggdrasil_decision_forests::metric::proto::EvaluationOptions&, const yggdrasil_decision_forests::model::proto::Prediction&, yggdrasil_decision_forests::utils::RandomEngine*, yggdrasil_decision_forests::metric::proto::EvaluationResults*)':
external/ydf/yggdrasil_decision_forests/metric/metric.cc:880:5: error: jump to case label [-fpermissive]
     default:
     ^~~~~~~
In file included from external/ydf/yggdrasil_decision_forests/utils/logging.h:19,
                 from external/ydf/yggdrasil_decision_forests/metric/metric.h:32,
                 from external/ydf/yggdrasil_decision_forests/metric/metric.cc:16:
external/ydf/yggdrasil_decision_forests/utils/logging_default.h:72:28: note:   crosses initialization of 'const absl::lts_20210324::Status status_for876'
   const auto COMBINE_TERMS(status_for, __LINE__) = expr;             \
                            ^~~~~~~~~~
external/ydf/yggdrasil_decision_forests/utils/logging_default.h:66:33: note: in definition of macro 'COMBINE_TERMS_SUB'
 #define COMBINE_TERMS_SUB(X, Y) X##Y  // helper macro
                                 ^
external/ydf/yggdrasil_decision_forests/utils/logging_default.h:72:14: note: in expansion of macro 'COMBINE_TERMS'
   const auto COMBINE_TERMS(status_for, __LINE__) = expr;             \
              ^~~~~~~~~~~~~
external/ydf/yggdrasil_decision_forests/metric/metric.cc:876:7: note: in expansion of macro 'CHECK_OK'
       CHECK_OK(uplift::AddUpliftPredictionImp(option, pred, rnd, eval));

(actually 2/3 of the exact same kind).

This is solved by properly encapsulating cases in respective scopes, in FinalizeEvaluation, InitializeEvaluation and AddPrediction methods in metric.cc.

switch (option.task()) {
    case model::proto::Task::CLASSIFICATION: {
      auto* eval_cls = eval->mutable_classification();
      const int32_t num_classes =
          label_column.categorical().number_of_unique_values();
      if (option.classification().roc_enable()) {
        // Compute the ROCs.
        for (int label_value = 0; label_value < num_classes; label_value++) {
          auto* roc = eval_cls->mutable_rocs()->Add();
          BuildROCCurve(option, label_column, *eval, label_value, roc);
        }
      }
    } break;
    case model::proto::Task::REGRESSION: {
      // Performs bootstrapping.
      if (option.bootstrapping_samples() > 0) {
        LOG(INFO) << "Computing rmse intervals of evaluation metrics with "
                     "bootstrapping.";
        internal::UpdateRMSEConfidenceIntervalUsingBootstrapping(option, eval);
      }
    } break;
    case model::proto::Task::RANKING: {
      FinalizeRankingMetricsFromSampledPredictions(option, eval);
    } break;
    case model::proto::Task::CATEGORICAL_UPLIFT: {
      CHECK_OK(uplift::FinalizeUpliftMetricsFromSampledPredictions(
          option, label_column, eval));
    } break;
    default:
      break;
  }

and analogously in the other methods.

Then, another error.

external/ydf/yggdrasil_decision_forests/model/random_forest/random_forest.cc:548:10:   required from here
external/ydf/yggdrasil_decision_forests/utils/compatibility.h:120:5: error: no matching function for call to 'std::vector<yggdrasil_decision_forests::model::proto::VariableImportance>::vector(std::remove_reference<const yggdrasil_decision_forests::utils::StatusOr<std::vector<yggdrasil_decision_forests::model::proto::VariableImportance> >&>::type)'
     new (&data_) T(std::move(data));
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This was a bit harder to understand, but it's (probably) solved by adding a .status() in return value of RandomForestModel::GetVariableImportance in external/ydf/yggdrasil_decision_forests/model/random_forest/random_forest.cc.

The same error (and solution) was applied to the following

external/ydf/yggdrasil_decision_forests/model/gradient_boosted_trees/gradient_boosted_trees.cc:545:10:   required from here
external/ydf/yggdrasil_decision_forests/utils/compatibility.h:120:5: error: no matching function for call to 'std::vector<yggdrasil_decision_forests::model::proto::VariableImportance>::vector(std::remove_reference<const yggdrasil_decision_forests::utils::StatusOr<std::vector<yggdrasil_decision_forests::model::proto::VariableImportance> >&>::type)'
     new (&data_) T(std::move(data));
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This error was quite a PITA.

external/ydf/yggdrasil_decision_forests/learner/distributed_gradient_boosted_trees/distributed_gradient_boosted_trees.cc:522:14:   required from here
external/ydf/yggdrasil_decision_forests/utils/compatibility.h:120:5: error: no matching function for call to 'std::unique_ptr<yggdrasil_decision_forests::model::gradient_boosted_trees::GradientBoostedTreesModel>::unique_ptr(std::remove_reference<const absl::lts_20210324::Status&>::type)'
     new (&data_) T(std::move(data));
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

the return value had to be a StatusOr, while a Status was being returned, so I had to wrap it and return a tils::StatusOr<std::unique_ptr<gradient_boosted_trees::GradientBoostedTreesModel>>(iter_status). Again,

external/ydf/yggdrasil_decision_forests/utils/distribute/implementations/grpc/grpc_worker.cc: In member function 'absl::lts_20210324::Status yggdrasil_decision_forests::distribute::{anonymous}::WorkerService::BlockingDoneOnWorker()':
external/ydf/yggdrasil_decision_forests/utils/status_macros_default.h:33:12: error: could not convert '_status' from 'grpc::Status' to 'absl::lts_20210324::Status'
     return _status;                                             
            ^~~~~~~
external/ydf/yggdrasil_decision_forests/utils/distribute/implementations/grpc/grpc_worker.cc:174:5: note: in expansion of macro 'RETURN_IF_ERROR'

An unnecessary AbslStatusToGrpcStatus was called from a (already) GrpcStatus.

That led to a complete build.

AlessandroFlati avatar Nov 28 '21 12:11 AlessandroFlati

Also, ./tools/build_pip_package.sh simple didn't work since it requires python3.9. Modifying the bash script with 3.8 would have been fine, except from the fact I had a custom built TensorFlow. In this case I would simply recommend not to --force-reinstall when you pip3 install dist/tensorflow_decision_forests-*-cp${PACKAGE}-cp${PACKAGE}*-linux_x86_64.whl.

Finally, examples/minimal.py is bad indented.

Traceback (most recent call last):
  File "examples/minimal.py", line 33, in <module>
    import tensorflow_decision_forests as tfdf
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/__init__.py", line 53, in <module>
    from tensorflow_decision_forests.tensorflow import check_version
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/tensorflow/check_version.py", line 33
    if tf_version is None:
    ^
IndentationError: expected an indented block

Eventually, I have the same error that I got installing via pip, which is

WARNING:root:Failure to load the custom c++ tensorflow ops. This error is likely caused the version of TensorFlow and TensorFlow Decision Forests are not compatible.
Traceback (most recent call last):
  File "examples/minimal.py", line 33, in <module>
    import tensorflow_decision_forests as tfdf
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/__init__.py", line 57, in <module>
    from tensorflow_decision_forests import keras
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/keras/__init__.py", line 49, in <module>
    from tensorflow_decision_forests.keras import core
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/keras/core.py", line 60, in <module>
    from tensorflow_decision_forests.tensorflow import core as tf_core
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/tensorflow/core.py", line 35, in <module>
    from tensorflow_decision_forests.tensorflow.ops.training import api as training_op
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/tensorflow/ops/training/api.py", line 29, in <module>
    raise e
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/tensorflow/ops/training/api.py", line 26, in <module>
    tf.load_op_library(resource_loader.get_path_to_datafile("training.so"))
  File "/home/alessandro/traiding/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/alessandro/traiding/lib/python3.8/site-packages/tensorflow_decision_forests/tensorflow/ops/training/training.so: undefined symbol: _ZN10tensorflow5ScopeD1Ev

and I have no idea why it doesn't find the linked library, since I just compiled it. Should I add something on the LD_LIBRARY_PATH?

AlessandroFlati avatar Nov 28 '21 12:11 AlessandroFlati

Thanks a lot for the alert and the search of solutions @AlessandroFlati ! :)

Those comments have been integrated in the commits:

  • https://github.com/tensorflow/decision-forests/commit/26a2e6751438d5ab1acc72aea63059c90f18ceee
  • https://github.com/google/yggdrasil-decision-forests/commit/b591941587143fc9c4ab748566e4e22d9fadd2bb

achoum avatar Nov 30 '21 15:11 achoum

No problem, I know too well how hard it takes to retrieve such "structured" feedback on non-standard environments. :) Maybe you have any suggestion for the missing symbol in the linked .so library?

AlessandroFlati avatar Nov 30 '21 16:11 AlessandroFlati

Can you give me some more details:

  • "I have the same error that I got installing via pip, which is" -> Do you mean that you see this error message both from the pip install version and from your own compilation of tf-df?

  • "undefined symbol: _ZN10tensorflow5ScopeD1Ev" -> Is that the entire message? Just to make sure: You have tf2.7.0 installed (the one compatible with tf-df 0.2.1)?

achoum avatar Dec 02 '21 15:12 achoum

The answer is yes at all questions :grin: Actually, I have a custom built TF2.7.0, because of Cuda Capabilities 3.0 which were not natively supported (they actually were, but they forgot to update the CMake, I opened a similar issue in their case) - but with the manually built version of TF I manage to use all functionalities of TF2.7, with CUDA, CUDNN etc. - do you know by any chanche where is located that TensorflowScope symbol? Maybe I forgot to compile a specific submodule of Tensorflow, or I don't know..

AlessandroFlati avatar Dec 02 '21 16:12 AlessandroFlati

Closing as obsolete, TF-DF, TF and bazel have changed significantly since. Feel free to open a new issue if you still have this problem.

rstz avatar Feb 22 '23 09:02 rstz