DataFed icon indicating copy to clipboard operation
DataFed copied to clipboard

[CI] Fix Failing Build Since CI updates

Open nedvedba opened this issue 9 months ago • 6 comments

Description

Currently the CI job is failing that is responsible for provisioning the DataFed Python Client VM. The issue with the CI is that the protobuf build does not support Python 3.12 and does not officially support >Python 3.9 and its deprecation of the pkg_resources package as of yet. I have tried bumping the versions of the protobuf and setuptools python libraries, all leading to the same issue. The current plan is to setup a Python 3.11 virtual environment that can be used by the install and build scripts, until a more permanent solution can be decided.

Seeing the problem with both latest version of protobuf>=6.30.2, and with currently running version 5.27.1.

Failing job shown here: https://code.ornl.gov/dlsw/datafed/datafed/-/jobs/3178328

Installing: /shared/install/lib/cmake/protobuf/protobuf-generate.cmake
Requirement already satisfied: numpy in /shared/install/python/datafed/lib/python3.12/site-packages (2.2.4)
Traceback (most recent call last):
  File "/home/gitlab-runner/builds/gYxDkX87B/1/dlsw/datafed/datafed/external/protobuf/python/setup.py", line 16, in <module>
    import pkg_resources
  File "/shared/install/python/datafed/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2191, in <module>
    register_finder(pkgutil.ImpImporter, find_on_path)
                    ^^^^^^^^^^^^^^^^^^^
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit status 1

Here is the relevant MR for the CI repository on installing a separate version of Python. https://code.ornl.gov/dlsw/datafed/ci/-/merge_requests/124

Acceptance

CI runs without failing after upgrading to ubuntu 24.

  • [X] Pin Python version to 3.9
  • [X] Get client provisioning job to pass
  • [X] Get dependency containers to build correctly with python 3.9

nedvedba avatar Apr 14 '25 11:04 nedvedba

Here is where it is failing.

https://github.com/ORNL/DataFed/blob/403636d987b753f6591f88d223e2c44871a498b5/scripts/dependency_install_functions.sh#L153 DataFed/scripts/dependency_install_functions.sh at 403636d987b753f6591f88d223e2c44871a498b5 · ORNL/DataFed A Federated Scientific Data Management System. Contribute to ORNL/DataFed development by creating an account on GitHub.

It's because the protobuf requriements.txt file does not include the needed module. I would recommend just adding a line the dependency install script that captures installation of the needed python dependencies as part of the protobuf_install. Or however else you want to fix this. If you want to fix it from the ansible side you will need to add a new role that installs the dependency as part of the setup process for the DataFedPythonClient playbook.

JoshuaSBrown avatar Apr 14 '25 14:04 JoshuaSBrown

Some developments have been made, and the GitLab CI was passing however the GitHub CI was not, I believe that after further research protobuf only supports Python 3.9 as the highest version and not 3.11 which will still allow protobuf to work but will not allow the tests to pass. To get the GitHub CI to work I have updated the python version that is installed to 3.9, along with setting it up to pull from the scripts/dependency_versions.sh file. We will see if this works when the CI finishes.

nedvedba avatar Apr 24 '25 17:04 nedvedba

A further update: after resolving the python issue, there seems to be an issue with the nlohmann_json library which was made for an older version of gcc and does not appear to be compatible with the version installed in Ubuntu 24.04. This has led me to try to update it to its newest version (3.12.0) which breaks the json schema validator library because it was made for a specific version of nlohmann_json. After updating the json schema validator library to its newest version, it is unable to find the nlohmann json library at build time of the core service, leading to an error in building. This is where I am currently stuck, nothing I can seem to do as of yet has let the schema validator library find the nlohmann_json library. I have tried: changing the nlohmann_json library to a specific version mentioned in the json schema validator library cmake files, printing what ENVs should be required to find it, manipulating the cmake files for those dependencies, etc. all to no avail.

Here is the error

-- Found nlohmann_json: /opt/datafed/dependencies/share/cmake/nlohmann_json/nlohmann_jsonConfig.cmake (found version "3.12.0")
CMake Error at cmake/JSONSchema.cmake:15 (find_package):
  Found package configuration file:

    /opt/datafed/dependencies/lib/cmake/nlohmann_json_schema_validator/nlohmann_json_schema_validatorConfig.cmake

  but it set nlohmann_json_schema_validator_FOUND to FALSE so package
  "nlohmann_json_schema_validator" is considered to be NOT FOUND.
Call Stack (most recent call first):
  cmake/JSONSchema.cmake:27 (find_json_schema_library)
  CMakeLists.txt:172 (include)

JSON schema validator cmake file https://github.com/pboettch/json-schema-validator/blob/349cba9f7e3cb423bbc1811bdd9f6770f520b468/CMakeLists.txt#L42

nedvedba avatar Apr 29 '25 20:04 nedvedba

Nlohmann json header files exist in the correct include folder, there is no library associated with nlohman.

JoshuaSBrown avatar Apr 29 '25 20:04 JoshuaSBrown

After adding some additional debug information with the below commands, it looks like the problem is that the cmake value saying that the json schema validator has been found has been set to false. It is currently unclear why this is happening.

Additional things we have tried:

  • setting LD_LIBRARY_PATH
  • verifying the libraries are actually installed where they say they are
  • checked which libraries the targets are actually linking to
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/datafed/dependencies/lib/" cmake -S. -B build -DCMAKE_BUILD_TYPE=Debug -DBUILD_WEB_SERVER=OFF -DCMAKE_FIND_DEBUG_MODE=ON
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/datafed/dependencies/lib/" cmake -S. -B build -DCMAKE_BUILD_TYPE=Debug -DBUILD_WEB_SERVER=OFF --trace-expand

nedvedba avatar Apr 30 '25 18:04 nedvedba

This issue has been solved. We were trying to find the config of Nlohmann json package which is not longer necessary with the newer version. The Schema Validator version is also not working correctly in the find_package command, but this has been worked around by checking the version after and then setting the environment variables differently. However, there is a new issue related to how the json library is used. I am investigating it now.

[ 88%] Building CXX object core/server/CMakeFiles/datafed-core-lib.dir/DatabaseAPI.cpp.o
In file included from /home/cloud/df-tmp/core/server/ClientWorker.cpp:3:
/home/cloud/df-tmp/core/server/ClientWorker.hpp:201:8: error: ‘void SDMS::Core::ClientWorker::error(const nlohmann::json_abi_v3_12_0::json_pointer<nlohmann::json_abi_v3_12_0::basic_json<> >&, const nlohmann::json_abi_v3_12_0::json&, const std::string&)’ marked ‘override’, but does not override
  201 |   void error(const nlohmann::json_pointer<nlohmann::basic_json<>> &a_ptr,
      |        ^~~~~
[ 90%] Building CXX object core/server/CMakeFiles/datafed-core-lib.dir/GlobusAPI.cpp.o
In file included from /home/cloud/df-tmp/core/server/CoreServer.cpp:3:
/home/cloud/df-tmp/core/server/ClientWorker.hpp:201:8: error: ‘void SDMS::Core::ClientWorker::error(const nlohmann::json_abi_v3_12_0::json_pointer<nlohmann::json_abi_v3_12_0::basic_json<> >&, const nlohmann::json_abi_v3_12_0::json&, const std::string&)’ marked ‘override’, but does not override
  201 |   void error(const nlohmann::json_pointer<nlohmann::basic_json<>> &a_ptr,
      |        ^~~~~

nedvedba avatar May 01 '25 18:05 nedvedba

Older branches that run through and older protobuff breaks things

AronPerez avatar May 21 '25 13:05 AronPerez