TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

Scikit-build-core build system rework

Open dudoslav opened this issue 1 year ago • 6 comments

This PR reworks build system for TileDB-Py to scikit-build-core + CMake. To comply with our packaging requirements I made sure that the TILEDB_PATH environment variable is properly handled. In all cases RPATH is set to the location where TileDB core library is present.

Additionally, Github workflows were added to create, test and upload PyPI wheels and sdist packages.


[sc-49832]

dudoslav avatar Jun 05 '24 14:06 dudoslav

Let's not forget to update the docs: https://docs.tiledb.com/main/how-to/installation/building-from-source/python It would be good to have a draft PR here: https://github.com/TileDB-Inc/TileDB-Developer-Docs before merging this PR.

kounelisagis avatar Jun 19 '24 13:06 kounelisagis

Let's not forget to update the docs: https://docs.tiledb.com/main/how-to/installation/building-from-source/python It would be good to have a draft PR here: https://github.com/TileDB-Inc/TileDB-Developer-Docs before merging this PR.

same for this: https://github.com/TileDB-Inc/TileDB-Py/blob/367dbe97c0b396b1a413a0b70dae5a81e1f216fa/CONTRIBUTING.md

kounelisagis avatar Jun 19 '24 18:06 kounelisagis

I also re-implemented daily tests CI/CD: https://github.com/dudoslav/TileDB-Py/actions/runs/9596846267

dudoslav avatar Jun 20 '24 11:06 dudoslav

Wheel contents difference: https://www.diffchecker.com/4spoJ46a/

dudoslav avatar Jun 20 '24 14:06 dudoslav

Latest diff with Licences: https://www.diffchecker.com/SxLbtGmC/

dudoslav avatar Jul 03 '24 14:07 dudoslav

Latest Diff: https://www.diffchecker.com/eFdvkppd/

dudoslav avatar Jul 03 '24 15:07 dudoslav

Thanks

  • we can exclude all of doc and ci
  • seems like this is a local thing that should not have been included? image

ihnorton avatar Jul 03 '24 20:07 ihnorton

Thanks

* we can exclude all of `doc` and `ci`

* seems like this is a local thing that should not have been included?
image

I removed doc and ci folders and made sure that I create wheel in a clean environment. This is the result right now: https://www.diffchecker.com/9i4Ge9l2/

Another difference that I found is that the original setup.py approach puts the libtiledb.so in a different place. I did not notice any difference when running the wheel or testing it, but I want to mention it:

1720088026_grim 1720088016_grim

dudoslav avatar Jul 04 '24 10:07 dudoslav

I have noticed the following packaging behavior:

When running pipx run build if TileDB core library is found in the system it gets picked up and used as the core TileDB library for the wheel. This however never happens in our CI/CD when deploying wheels, so it should be fine, but it can lead to incorrect versions of TileDB included inside wheel for packages produced manually, for example in a Conda environment with TileDB installed in it.

dudoslav avatar Jul 04 '24 11:07 dudoslav

Testing if Conda feedstock will work: https://github.com/conda-forge/tiledb-py-feedstock/pull/224

dudoslav avatar Jul 10 '24 14:07 dudoslav

Seems like conda packages were able to produce: https://github.com/conda-forge/tiledb-py-feedstock/pull/224

dudoslav avatar Jul 11 '24 10:07 dudoslav

I tested the shared object copying behavior similar to how I did for TileDB-VCF (https://github.com/TileDB-Inc/TileDB-VCF/issues/717).

I ran into 2 issues:

  • When bundling libtiledb.so into the installed Python package, I get the error AttributeError: module 'tiledb.cc' has no attribute 'Config' when running import tiledb
  • When using an external libtiledb.so, the compilation fails
docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes git python-is-python3 python3 python3-pip python3-venv unzip wget
cd
mkdir downloads install-libtiledb

# Install nightly binary
the_date=2024-04-08
wget --quiet -P downloads/ \
  https://github.com/jdblischak/centralized-tiledb-nightlies/releases/download/$the_date/libtiledb-$the_date.tar.gz
tar -C install-libtiledb -xzf downloads/libtiledb-$the_date.tar.gz
export LD_LIBRARY_PATH=$(pwd)/install-libtiledb/lib

# Clone TileDB-Py
git clone https://github.com/TileDB-Inc/TileDB-Py.git
cd TileDB-Py/
git checkout db/scikit-build-core

# Build and install TileDB-Py **with** bundled libtiledb
python -m pip install -v .
##   -- Downloading TileDB default version ...
##   -- [download 100% complete]
##   -- Adding "libtiledb" into install group
##
##   -- Installing: /tmp/tmppmr6xlbn/wheel/platlib/tiledb/libtiledb.so.2.24
##
## Successfully installed numpy-2.0.0 tiledb-0.30.3.dev109+gd18362b

ls /usr/local/lib/python3.10/dist-packages/tiledb/*.so*
## /usr/local/lib/python3.10/dist-packages/tiledb/cc.cpython-310-x86_64-linux-gnu.so
## /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so
## /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.so.2.24
## /usr/local/lib/python3.10/dist-packages/tiledb/main.cpython-310-x86_64-linux-gnu.so
ldd /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so | grep libtiledb
## libtiledb.so.2.24 => /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.so.2.24 (0x00007f3ac2e8a000)
readelf -d /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so | grep RUNPATH
## 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN]
python -c "import tiledb; tiledb.libtiledb.version()"
## Traceback (most recent call last):
##   File "<string>", line 1, in <module>
##   File "/root/TileDB-Py/tiledb/__init__.py", line 22, in <module>
##     from .array_schema import ArraySchema
##   File "/root/TileDB-Py/tiledb/array_schema.py", line 10, in <module>
##     from .attribute import Attr
##   File "/root/TileDB-Py/tiledb/attribute.py", line 9, in <module>
##     from .ctx import Ctx, CtxMixin
##   File "/root/TileDB-Py/tiledb/ctx.py", line 16, in <module>
##     class Config(lt.Config):
## AttributeError: module 'tiledb.cc' has no attribute 'Config'

python -m pip uninstall --yes tiledb

# Build and install TileDB-Py **without** bundled libtiledb
export TILEDB_PATH=${HOME}/install-libtiledb/
python -m pip install -v .
## -- Found external TileDB core library
## -- Setting RPATH for targets "main" and "libtiledb" to /root/install-libtiledb/lib
## -- Setting RPATH for target "cc" to /root/install-libtiledb/lib
##
##   In file included from /tmp/tmp8dwduzb1/build/tiledb/libtiledb.cc:1248:
##   /root/install-libtiledb/include/tiledb/tiledb.h:2973:34: note: declared here
##    2973 | TILEDB_DEPRECATED_EXPORT int32_t tiledb_array_delete_fragments(
##         |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##   ninja: build stopped: subcommand failed.
##
##   *** CMake build failed
##   error: subprocess-exited-with-error
##
##   × Building wheel for tiledb (pyproject.toml) did not run successfully.
##   │ exit code: 1
##   ╰─> See above for output.
##
##   note: This error originates from a subprocess, and is likely not a problem with pip.
##   full command: /usr/bin/python /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpkbu_gaxq
##   cwd: /root/TileDB-Py
##   Building wheel for tiledb (pyproject.toml) ... error
##   ERROR: Failed building wheel for tiledb
## Failed to build tiledb
## ERROR: Could not build wheels for tiledb, which is required to install pyproject.toml-based projects

jdblischak avatar Jul 11 '24 18:07 jdblischak

@ihnorton explained to me the source of the import error I reported above. It's because I ran it within the Git repo, and by default Python looks for ./module.py and ./module/__init__.py. Thus it ran the local one instead of the one installed in site-packages. Thus the fix it to either run an editable install (-e) or move outside of the Git repo.

The reprex below confirms that the import works fine outside of the Git repo:

docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes git python-is-python3 python3 python3-pip python3-venv unzip wget
cd

# Clone TileDB-Py
git clone https://github.com/TileDB-Inc/TileDB-Py.git
cd TileDB-Py/
git checkout db/scikit-build-core

# Build and install TileDB-Py **with** bundled libtiledb
python -m pip install -v .

# Run import test outside of Git repo
cd
python -c "import tiledb; print(tiledb.libtiledb.version())"
## (2, 24, 0)

jdblischak avatar Jul 12 '24 18:07 jdblischak

Also, I switched to using the 2.24.2 release tarball, and I was able to compile TileDB-Py against the external libtiledb. The shared object isn't copied, and all the linking looks good.

docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes git python-is-python3 python3 python3-pip python3-venv unzip wget
cd
mkdir downloads install-libtiledb

# Install 2.24.2 release binary
# https://github.com/TileDB-Inc/TileDB/releases/tag/2.24.2
wget --quiet -P downloads/ \
  https://github.com/TileDB-Inc/TileDB/releases/download/2.24.2/tiledb-linux-x86_64-2.24.2-76cd03c.tar.gz
tar -C install-libtiledb -xzf downloads/tiledb-linux-x86_64-2.24.2-76cd03c.tar.gz
export LD_LIBRARY_PATH=$(pwd)/install-libtiledb/lib

# Clone TileDB-Py
git clone https://github.com/TileDB-Inc/TileDB-Py.git
cd TileDB-Py/
git checkout db/scikit-build-core

# Build and install TileDB-Py **without** bundled libtiledb
export TILEDB_PATH=${HOME}/install-libtiledb/
python -m pip install -vv .
##  -- Found external TileDB core library
##  -- Setting RPATH for targets "main" and "libtiledb" to /root/install-libtiledb/lib
##  -- Setting RPATH for target "cc" to /root/install-libtiledb/lib

ls /usr/local/lib/python3.10/dist-packages/tiledb/*.so*
## /usr/local/lib/python3.10/dist-packages/tiledb/cc.cpython-310-x86_64-linux-gnu.so
## /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so
## /usr/local/lib/python3.10/dist-packages/tiledb/main.cpython-310-x86_64-linux-gnu.so

ldd /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so | grep libtiledb
## libtiledb.so.2.24 => /root/install-libtiledb/lib/libtiledb.so.2.24 (0x00007f7dda6f6000)

readelf -d /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so | grep RUNPATH
##  0x000000000000001d (RUNPATH)            Library runpath: [/root/install-libtiledb/lib]

cd
python -c "import tiledb; print(tiledb.libtiledb.version())"
## (2, 24, 2)

jdblischak avatar Jul 12 '24 19:07 jdblischak

Lastly, I got this PR working in a branch of my centralized-tiledb-nightlies, so I'll be able to quickly migrate the nightly TileDB-Py build after this is merged.

jdblischak avatar Jul 12 '24 19:07 jdblischak

The wheel size is most probably due to strip=False and build_type=RelWithDebInfo. Latest commit removed these options, lets see if the size will get smaller.

dudoslav avatar Jul 22 '24 12:07 dudoslav

1721652445_grim Latest wheels are about 15MB in size

dudoslav avatar Jul 22 '24 12:07 dudoslav