Add option to install libtiledb.so
This PR aims to allow an option to include/exclude libtiledb.so in the install group. The install group affects what is included in the PYPI packages and since these packages are used in Conda recipes also Conda packages. This way we can produce these configurations:
- PYPI wheels should contain libtiledb.so in them, so we install with default variables
pip install . - PYPI sdist should produce the same package as wheel (when finished) so the default CMake environment includes libtiledb.so in the install group
- Conda recipe uses PYPI sdist so we have to modify its script to execute:
TileDB_DIR=${PREFIX} INSTALL_LIB_TILEDB="OFF" pip install --no-deps .This way we will use libtiledb from Conda prefix and we also have to remove it from install group.
This PR's sole purpose is to enable Conda build to not package libtiledb.so. No other packages/functionality should be affected.
Conda feedstock PR (that requires this change): https://github.com/TileDB-Inc/tiledb-vector-search-feedstock/pull/35
pip install .
...
-- Installing: /tmp/tmpeetkpcfm/wheel/platlib/tiledb/vector_search/./_tiledbvspy.cpython-311-x86_64-linux-gnu.so
-- Set non-toolchain portion of runtime path of "/tmp/tmpeetkpcfm/wheel/platlib/tiledb/vector_search/./_tiledbvspy.cpython-311-x86_64-linux-gnu.so" to "$ORIGIN/lib"
-- Installing: /tmp/tmpeetkpcfm/wheel/platlib/tiledb/vector_search/lib/libtiledb.so.2.22 << HERE
*** Installing project into wheel...
-- Install configuration: "Release"
*** Making wheel...
*** Created tiledb_vector_search-0.2.3.dev41+gda176afc.d20240506-cp311-cp311-linux_x86_64.whl...
Building wheel for tiledb-vector-search (pyproject.toml) ... done
Created wheel for tiledb-vector-search: filename=tiledb_vector_search-0.2.3.dev41+gda176afc.d20240506-cp311-cp311-linux_x86_64.whl size=6868083 sha256=9ed54d767b3f7a380d32e39437ff8ee5ff8eb88e65894aa7dc1389b13fdae9e1
Stored in directory: /home/dudoslav/.cache/pip/wheels/16/d2/38/a9b8e505638ba122beb55762af7fcf1d8a013d27ebdf19587f
Successfully built tiledb-vector-search
Installing collected packages: tiledb-vector-search
Attempting uninstall: tiledb-vector-search
Found existing installation: tiledb-vector-search 0.2.3.dev41+gda176afc.d20240506
Uninstalling tiledb-vector-search-0.2.3.dev41+gda176afc.d20240506:
Removing file or directory /home/dudoslav/.miniforge3/envs/zoo/lib/python3.11/site-packages/tiledb/vector_search/
Removing file or directory /home/dudoslav/.miniforge3/envs/zoo/lib/python3.11/site-packages/tiledb_vector_search-0.2.3.dev41+gda176afc.d20240506.dist-info/
Successfully uninstalled tiledb-vector-search-0.2.3.dev41+gda176afc.d20240506
Successfully installed tiledb-vector-search-0.2.3.dev41+gda176afc.d20240506
INSTALL_LIB_TILEDB="OFF" pip install .
...
-- Installing: /tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/include/experimental/__p2630_bits/submdspan_extents.hpp
-- Installing: /tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/include/experimental/__p2630_bits/submdspan_mapping.hpp
-- Installing: /tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/include/experimental/__p2642_bits
-- Installing: /tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/include/experimental/__p2642_bits/layout_padded.hpp
-- Installing: /tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/include/experimental/__p2642_bits/layout_padded_fwd.hpp
-- Installing: /tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/lib/cmake/mdspan/mdspanConfig.cmake
-- Installing: /tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/lib/cmake/mdspan/mdspanConfigVersion.cmake
-- Installing: /tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/./_tiledbvspy.cpython-311-x86_64-linux-gnu.so
-- Set non-toolchain portion of runtime path of "/tmp/tmp49y6peo6/wheel/platlib/tiledb/vector_search/./_tiledbvspy.cpython-311-x86_64-linux-gnu.so" to "$ORIGIN/lib"
*** Installing project into wheel...
-- Install configuration: "Release"
*** Making wheel...
*** Created tiledb_vector_search-0.2.3.dev41+gda176afc.d20240506-cp311-cp311-linux_x86_64.whl...
Building wheel for tiledb-vector-search (pyproject.toml) ... done
Created wheel for tiledb-vector-search: filename=tiledb_vector_search-0.2.3.dev41+gda176afc.d20240506-cp311-cp311-linux_x86_64.whl size=1271127 sha256=25fcabc87bac093aaf2d21757f75afe69dc4e66a8ec3d6d52dbb320fe23e15b0
Stored in directory: /home/dudoslav/.cache/pip/wheels/16/d2/38/a9b8e505638ba122beb55762af7fcf1d8a013d27ebdf19587f
Successfully built tiledb-vector-search
Installing collected packages: tiledb-vector-search
Attempting uninstall: tiledb-vector-search
Found existing installation: tiledb-vector-search 0.1.1.dev42+gd8f58b51
Uninstalling tiledb-vector-search-0.1.1.dev42+gd8f58b51:
Removing file or directory /home/dudoslav/.miniforge3/envs/zoo/lib/python3.11/site-packages/tiledb/vector_search/
Removing file or directory /home/dudoslav/.miniforge3/envs/zoo/lib/python3.11/site-packages/tiledb_vector_search-0.1.1.dev42+gd8f58b51.dist-info/
Successfully uninstalled tiledb-vector-search-0.1.1.dev42+gd8f58b51
Successfully installed tiledb-vector-search-0.2.3.dev41+gda176afc.d20240506
No libtiledb.so in sight.
I just noticed that when you run: TILEDB_PATH=${CONDA_PREFIX} pip install --no-deps .
I get:
(zoo) taco:TileDB-Vector-Search dudoslav$ python -c 'import tiledb.vector_search'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dudoslav/.miniforge3/envs/zoo/lib/python3.11/site-packages/tiledb/vector_search/__init__.py", line 5, in <module>
from .flat_index import FlatIndex
File "/home/dudoslav/.miniforge3/envs/zoo/lib/python3.11/site-packages/tiledb/vector_search/flat_index.py", line 5, in <module>
from tiledb.vector_search import index
File "/home/dudoslav/.miniforge3/envs/zoo/lib/python3.11/site-packages/tiledb/vector_search/index.py", line 7, in <module>
from tiledb.vector_search import _tiledbvspy as vspy
ImportError: libtiledb.so.2.22: cannot open shared object file: No such file or directory
So this needs fixing
Ok the shared object search path was modified here:
if (APPLE)
set_target_properties(${VSPY_TARGET_NAME} PROPERTIES INSTALL_RPATH "@loader_path/lib")
elseif(UNIX)
set_target_properties(${VSPY_TARGET_NAME} PROPERTIES INSTALL_RPATH "\$ORIGIN/lib")
endif()
so this logic needs to only happen when we also package libtiledb.so
Why are we adding
TILEDB_PATHhere rather than just usingTileDB_DIR?
@ihnorton I believe that was at my request. I would like to keep the build configuration options for vector search, soma, and vcf as consistent as possible. SOMA also uses TILEDB_PATH
I would like to keep the build configuration options for vector search, soma, and vcf as consistent as possible. SOMA also uses TILEDB_PATH
Isn't that only an environment variable, not a CMake variable? https://github.com/search?q=repo%3Asingle-cell-data%2FTileDB-SOMA%20tiledb_path&type=code
Isn't that only an environment variable, not a CMake variable?
Yes, good point. I only care about the name of the env var, not the CMake variable
OK so latest changes and behavior:
- You can set
TILEDB_PATH. This just setsTileDB_DIRautomatically for you and that's all, sofind_packagewill look for it there. - If
libtiledbhas been downloaded during superbuild step a variableTILEDB_DOWNLOADEDwill be passed to subsequent build. (This is just implementation detail) - During TileDB-Py build (so after superbuild) if
TILEDB_DOWNLOADEDis set it will get included into install group and this means into wheel or conda package. If it is not set it won't get included. - Note that all of this is implicitly done and you need to monitor what is really happening, because for example if you use conda environment that has tiledb installed it will pick that one and won't download tiledb for you. Even if producing wheels. Which might be what we want.
Thanks for working on this @dudoslav! I apologize for my delay in reviewing this PR.
I did some local testing. Within a conda environment, it works as I expected. However, outside of a conda environment, I am having trouble getting it to find libtiledb.so via TILEDB_PATH. Am I doing something wrong?
Also, is there is a small test that I can run to confirm that tiledb_vector_search is properly linked to the correct libtiledb.so? I noticed that neither RPATH or RUNPATH are set for the Python shared object (ie _tiledbvspy.cpython-XX-x86_64-linux-gnu.so).
The following demonstrates that within a conda env, the build finds libtiledb.so and does not copy it unnecessarily.
# conda
mamba create --yes -n test-vector-so \
-c conda-forge --override-channels \
python=3.9 tiledb
## + python 3.9.19 h0755675_0_cpython conda-forge Cached
## + tiledb 2.23.0 hfa691db_2 conda-forge 4MB
mamba activate test-vector-so
ls $CONDA_PREFIX/lib/libtiledb*
## /home/wsl/mambaforge/envs/test-vector-so/lib/libtiledb.so
## /home/wsl/mambaforge/envs/test-vector-so/lib/libtiledb.so.2.23
## current behavior
git log -n 1 --pretty=reference
## f564a8d (Share max int / float values and index lists in Python (#396), 2024-05-31)
python -m pip install -v .
## -- Found TileDB: /home/wsl/mambaforge/envs/test-vector-so/lib/libtiledb.so.2.23
## -- Installing: /tmp/tmp3nkssy3o/wheel/platlib/tiledb/vector_search/lib/libtiledb.so.2.23
ls $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/lib
## cmake libdocopt.a libtiledb.so.2.23 pkgconfig
python -m pip uninstall -v --yes tiledb_vector_search
## proposed behavior
git checkout db/sc-46695/install_group
git log -n 1 --pretty=reference
## 08e8fda (Change install directory, 2024-05-14)
python -m pip install -v .
## -- Found TileDB: /home/wsl/mambaforge/envs/test-vector-so/lib/libtiledb.so.2.23
ls $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/lib
## cmake libdocopt.a pkgconfig
ldd $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/_tiledbvspy.cpython-39-x86_64-linux-gnu.so
## linux-vdso.so.1 (0x00007ffd0c919000)
## libtiledb.so.2.23 => not found
## libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f29d6197000)
## libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f29d60b0000)
## libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f29d6090000)
## libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f29d5e67000)
## /lib64/ld-linux-x86-64.so.2 (0x00007f29d6742000)
readelf -d $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/_tiledbvspy.cpython-39-x86_64-linux-gnu.so | grep R*PATH
python -c "from tiledb.vector_search.version import version; print(version)"
## 0.3.1.dev28+g08e8fda
mamba deactivate
The below demonstrates that setting TILEDB_PATH is not sufficient for finding an external libtiledb.so (at least not how I expected to be able to use it).
# No conda
# Download pre-built libtiledb binary
wget -P /tmp/ https://github.com/TileDB-Inc/TileDB/releases/download/2.23.0/tiledb-linux-x86_64-2.23.0-152093b.tar.gz
mkdir /tmp/tmp-tiledb
tar xzf /tmp/tiledb-linux-x86_64-2.23.0-152093b.tar.gz -C /tmp/tmp-tiledb
ls /tmp/tmp-tiledb/lib/
## cmake libtiledb.so libtiledb.so.2.23 pkgconfig
# Create Python virtual env
python -m venv ./venv-vector
mamba deactivate
source ./venv-vector/bin/activate
## proposed behavior
export TILEDB_PATH=/tmp/tmp-tiledb
python -m pip install -v .
## -- Setting TileDB_DIR to /tmp/tmp-tiledb
## -- Could NOT find TileDB (missing: TileDB_DIR)
## -- Adding TileDB as an external project
export TILEDB_PATH=/tmp/tmp-tiledb/lib
python -m pip install -v .
## -- Setting TileDB_DIR to /tmp/tmp-tiledb/lib
## -- Could NOT find TileDB (missing: TileDB_DIR)
## -- Adding TileDB as an external project
deactivate
Thanks @jdblischak for your reply. Well, I am not sure what was the original path that we used to put there but right now a path to TileDBConfig.cmake is needed this works:
(venv-vector) taco:TileDB-Vector-Search dudoslav$ export TILEDB_PATH=/tmp/tmp-tiledb/lib/cmake/TileDB
(venv-vector) taco:TileDB-Vector-Search dudoslav$ python -m pip install -v .
...
...
-- Found TileDB: /tmp/tmp-tiledb/lib/libtiledb.so.2.23
Of course, I can change this behavior. Another question is: Should I also set RPath even if I don't copy libtiledb.so into the wheel? So in this case set RPath to /tmp/tmp-tiledb/lib/libtiledb.so.2.23?
right now a path to TileDBConfig.cmake is needed this works
My proposal would be the following:
TILEDB_PATHis the path to the installation directory. For a conda env this would be$CONDA_PREFIX, for conda-build this would be$PREFIX, and for CMake this would beCMAKE_INSTALL_PREFIX- If
TILEDB_PATHis defined, then updateTileDB_DIRto$TILEDB_PATH/lib/cmake/TileDB
Another question is: Should I also set RPath even if I don't copy
libtiledb.sointo the wheel? So in this case set RPath to/tmp/tmp-tiledb/lib/libtiledb.so.2.23?
I think it makes sense to set RPath even in the case of using an external libtiledb.so. This will prevent runtime errors. On Linux setting LD_LIBRARY_PATH usually works, but this same trick typically doesn't work on macOS because of SIP, which blocks DYLD_LIBRARY_PATH.
@jdblischak I modified the build logic to ALWAYS set RPATH. When installing libtiledb.so (so for wheels) the RPATH is set to relative path: @loader_path/lib. When using external libtiledb.so the RPATH is set to the directory that the libtiledb.so is in. So for the second example you mentioned few comments ago the resulting readelf looks like this:
(venv-vector) taco:TileDB-Vector-Search dudoslav$ readelf -d venv-vector/lib/python3.10/site-packages/tiledb/vector_search/_tiledbvspy.cpython-310-x86_64-linux-gnu.so
Dynamic section at offset 0x358220 contains 30 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libtiledb.so.2.23]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2]
0x000000000000001d (RUNPATH) Library runpath: [/tmp/tmp-tiledb/lib]
0x000000000000000c (INIT) 0x6c000
...
Additionally, I modified the search logic to use CMAKE_PREFIX_PATH instead of TileDB_DIR so even export TILEDB_PATH=/tmp/tmp-tiledb works.
Please let me know if additional changes are required.
I re-ran my tests from above. From my perspective everything looks good. When an external libtiledb.so is used, it isn't copied unnecessarily into the installed Python package. I assume the existing CI ensures that the shared object is copied into the wheel for distribution to PyPI.
I did notice that ldd returned many more results in the conda env after modifying the RPATH, but I assume that is because there are so many shared objects installed there. I don't expect this to be a problem, but I wanted to point out the change in case that could signal a potential problem.
# conda
mamba activate test-vector-so
ls $CONDA_PREFIX/lib/libtiledb*
## /home/wsl/mambaforge/envs/test-vector-so/lib/libtiledb.so
## /home/wsl/mambaforge/envs/test-vector-so/lib/libtiledb.so.2.23
# Clean up from last time and update
python -m pip uninstall -v tiledb-vector-search
git pull upstream db/sc-46695/install_group
git log -n 1 --pretty=reference
## a7bf012 (Fix TileDB search directory and use RPATH, 2024-06-04)
python -m pip install -v .
## -- Found TileDB: /home/wsl/mambaforge/envs/test-vector-so/lib/libtiledb.so.2.23
ls $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/lib
## cmake libdocopt.a pkgconfig
ldd $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/_tiledbvspy.cpython-39-x86_64-linux-gnu.so | tail -n 3
## libicui18n.so.73 => /home/wsl/mambaforge/envs/test-vector-so/lib/./././libicui18n.so.73 (0x00007f39aa65b000)
## libicuuc.so.73 => /home/wsl/mambaforge/envs/test-vector-so/lib/./././libicuuc.so.73 (0x00007f39aa44f000)
## libicudata.so.73 => /home/wsl/mambaforge/envs/test-vector-so/lib/./././libicudata.so.73 (0x00007f39a85c0000)
ldd $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/_tiledbvspy.cpython-39-x86_64-linux-gnu.so | wc -l
## 95
ldd $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/_tiledbvspy.cpython-39-x86_64-linux-gnu.so | grep libtiledb
## libtiledb.so.2.23 => /home/wsl/mambaforge/envs/test-vector-so/lib/libtiledb.so.2.23 (0x00007f8c8d5ee000)
readelf -d $CONDA_PREFIX/lib/python3.9/site-packages/tiledb/vector_search/_tiledbvspy.cpython-39-x86_64-linux-gnu.so | grep R*PATH
## 0x000000000000001d (RUNPATH) Library runpath: [/home/wsl/mambaforge/envs/test-vector-so/lib]
python -c "from tiledb.vector_search.version import version; print(version)"
## 0.3.1.dev29+ga7bf012
mamba deactivate
# No conda
# Download pre-built libtiledb binary
wget -P /tmp/ https://github.com/TileDB-Inc/TileDB/releases/download/2.23.0/tiledb-linux-x86_64-2.23.0-152093b.tar.gz
mkdir /tmp/tmp-tiledb
tar xzf /tmp/tiledb-linux-x86_64-2.23.0-152093b.tar.gz -C /tmp/tmp-tiledb
ls /tmp/tmp-tiledb/lib/
## cmake libtiledb.so libtiledb.so.2.23 pkgconfig
mamba deactivate
source ./venv-vector/bin/activate
# Clean up from last time
python -m pip uninstall -v tiledb-vector-search
export TILEDB_PATH=/tmp/tmp-tiledb
python -m pip install -v .
## -- Setting TileDB_DIR to /tmp/tmp-tiledb
## -- Could NOT find TileDB (missing: TileDB_DIR)
## -- Adding TileDB as an external project
export TILEDB_PATH=/tmp/tmp-tiledb/lib
python -m pip install -v .
## -- Adding TILEDB_PATH to CMAKE_PREFIX_PATH
## -- Found TileDB: /tmp/tmp-tiledb/lib/libtiledb.so.2.23
## -- TileDB_DIR is set to /tmp/tmp-tiledb/lib/cmake/TileDB -- find_package will search there first.
## -- Found TileDB: /tmp/tmp-tiledb/lib/libtiledb.so.2.23
## -- Setting RPATH to /tmp/tmp-tiledb/lib
ls ./venv-vector/lib/python3.10/site-packages/tiledb/vector_search/lib
## cmake libdocopt.a pkgconfig
ldd ./venv-vector/lib/python3.10/site-packages/tiledb/vector_search/_tiledbvspy.cpython-310-x86_64-linux-gnu.so | wc -l
## 10
ldd ./venv-vector/lib/python3.10/site-packages/tiledb/vector_search/_tiledbvspy.cpython-310-x86_64-linux-gnu.so | grep libtiledb
## libtiledb.so.2.23 => /tmp/tmp-tiledb/lib/libtiledb.so.2.23 (0x00007f1db76b1000)
readelf -d ./venv-vector/lib/python3.10/site-packages/tiledb/vector_search/_tiledbvspy.cpython-310-x86_64-linux-gnu.so | grep R*PATH
## 0x000000000000001d (RUNPATH) Library runpath: [/tmp/tmp-tiledb/lib]
python -c "from tiledb.vector_search.version import version; print(version)"
## 0.3.1.dev29+ga7bf012
deactivate