ranking icon indicating copy to clipboard operation
ranking copied to clipboard

ranking installation silently upgrades the TF version.

Open DbVikas opened this issue 3 years ago • 5 comments

I'm using TF 2.4.1 version that is compatible with the Horovod version I have in place. When I install the ranking library, it silently upgrades the TF to the latest version that later conflicts with the Horovod. Following is the stack strace.

[1,10]<stderr>:Traceback (most recent call last):
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/horovod/tensorflow/mpi_ops.py", line 47, in <module>
[1,10]<stderr>: MPI_LIB = _load_library('mpi_lib' + get_ext_suffix())
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/horovod/tensorflow/mpi_ops.py", line 42, in _load_library
[1,10]<stderr>: library = load_library.load_op_library(filename)
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
[1,10]<stderr>: lib_handle = py_tf.TF_LoadLibrary(library_filename)
[1,10]<stderr>:tensorflow.python.framework.errors_impl.NotFoundError: /local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/horovod/tensorflow/mpi_lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN10tensorflow11GetNodeAttrERKNS_9AttrSliceEN4absl14lts_2020_02_2511string_viewEPf
[1,10]<stderr>:
[1,10]<stderr>:The above exception was the direct cause of the following exception:
[1,10]<stderr>:
[1,10]<stderr>:Traceback (most recent call last):
[1,10]<stderr>: File "<string>", line 1, in <module>
[1,10]<stderr>: File "/databricks/.python_edge_libs/sparkdl/horovod/runner.py", line 217, in wrapped_main
[1,10]<stderr>: return_value = main(**kwargs)
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/lodgingsort/runner.py", line 103, in _run_multi_worker_training
[1,10]<stderr>: import horovod.tensorflow as hvd
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/horovod/tensorflow/__init__.py", line 26, in <module>
[1,10]<stderr>: from horovod.tensorflow import elastic
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/horovod/tensorflow/elastic.py", line 24, in <module>
[1,10]<stderr>: from horovod.tensorflow.functions import broadcast_object, broadcast_object_fn, broadcast_variables
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/horovod/tensorflow/functions.py", line 24, in <module>
[1,10]<stderr>: from horovod.tensorflow.mpi_ops import allgather, broadcast
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/horovod/tensorflow/mpi_ops.py", line 49, in <module>
[1,10]<stderr>: check_installed_version('tensorflow', tf.__version__, e)
[1,10]<stderr>: File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a988a17e-1421-48f9-b928-65cd81681867/lib/python3.8/site-packages/horovod/common/util.py", line 268, in check_installed_version
[1,10]<stderr>: raise HorovodVersionMismatchError(name, version, installed_version) from exception
[1,10]<stderr>:horovod.common.exceptions.HorovodVersionMismatchError: Framework tensorflow installed with version 2.4.1 but found version 2.5.0.
[1,10]<stderr>: This can result in unexpected behavior including runtime errors.
[1,10]<stderr>: Reinstall Horovod using `pip install --no-cache-dir` to build with the new version.

I see in the setup file of the ranking project, we check for 'tensorflow-serving-api >= 2.0.0, < 3.0.0', 'tf-models-official >= 2.5.0' packages and install the latest one. I tried modifying this to 2.4.0 version but it did not help as installing the 2.4 version of tensorflow-serving-api and tf-models-official also upgrading the TF version to 2.5.

It would be a great help if you can suggest a way to use the ranking package with TF 2.4.1 version? Thanks in advance.

DbVikas avatar Jun 02 '21 15:06 DbVikas

You can uninstall the tensorflow and then install the tf2.4.1 again. Some of the functions in tf-ranking may break, but it may not impact your usecase?

xuanhuiwang avatar Jun 02 '21 23:06 xuanhuiwang

@xuanhuiwang hello. I report the same issue. The TFR version I am trying to install is an older version of TFR: https://github.com/tensorflow/ranking/releases/tag/v0.3.3 , while having installed TF 2.4.1 before. According to the release notes 0.3.3 is compatible with tensorflow v2.4.1. But, I also see that the core TF library is silently getting upgraded to v2.5.0.

Collecting tensorflow-ranking==0.3.3
  Downloading tensorflow_ranking-0.3.3-py2.py3-none-any.whl (109 kB)
     |████████████████████████████████| 109 kB 3.9 MB/s eta 0:00:01
...
Collecting tensorflow-serving-api<3.0.0,>=2.0.0
  Downloading tensorflow_serving_api-2.5.1-py2.py3-none-any.whl (38 kB)
Requirement already satisfied: absl-py>=0.1.6 in /opt/conda/lib/python3.8/site-packages (from tensorflow-ranking==0.3.3) (0.12.0)
Collecting tensorflow<3,>=2.5.0
  Downloading tensorflow-2.5.0-cp38-cp38-manylinux2010_x86_64.whl (454.4 MB)
     |█████████████████▉              | 253.7 MB 20.7 MB/s eta 0:00:10

It would be great if the installation behavior of legacy TFR versions would not upgrade the core TF if the compatible TF version (i.e.: 2.4.1) is already present in the system

Could you please advise?

Thank you

azagniotov avatar Jun 15 '21 07:06 azagniotov

Same issue--not sure how to proceed but this is breaking our pipeline even though we have the version pinned to 0.3.3

davidcereal avatar Jul 02 '21 19:07 davidcereal

I guess the reason is that you don't have tensorflow-serving-api installed before. When this is the case, the latest version of tensorflow-serving-api is then installed, which also upgrades the tensorflow package.

What if you manually install the tensorflow-serving-api to the same version as the tensorflow version? Then try to install the tf-ranking v0.3.3?

xuanhuiwang avatar Jul 02 '21 20:07 xuanhuiwang

for our use-case (same as @DbVikas in Databricks), we do not use TL serving api, since we use tf ranking for training. What is our solution for this?

motiteux avatar Sep 03 '21 13:09 motiteux