What to do with Python code?
I see this note in the docs:
"If you don't need a binary build, you don't need to use a binary build backend! There are some very good Python build backends"
Fair enough -- for my pure python projects, I'll look elsewhere.
However, a compiled-code plus Python project is pretty common -- I know I almost always have at least a bit of Python calling into my compiled code.
And it seems that scikit-build does support that, but I'm having a hard time finding docs or examples as to how to do it "properly". (https://github.com/scikit-build/scikit-build-sample-projects is pretty sparse -- looking for contributions?)
-
is it possible to leverage hatch or setuptools, or poetry , or ??? to do the Python part, while using scikit build for the compiled parts?
-
If not, then how do you get the Python package built properly? any good examples out there?
The problem at hand:
I have a large package that's a lot of Python and moderate amount of C++ wrapped with Cython.
We've been building it for years with setuptools and a pile of hacks -- scikit-build is MUCH better for the compiled bits.
And it's working now -- but we have a few issues:
-
an apparent bug in editable mode: (#757)
-
A HUGE amount of extra cruft being installed with the package.
-
It seems to be simply copying everything in (it may be not copying stuff in .gitignore, but there's the known issue of it not respecting nested .gitignores--which we make heavy use of. but that's a known issue.
-
I can probably clean that up with use of
sdist.includeandsdist.exclude, but it would be nice if there was a more automated way to do that -- ala setuptools find_packages, or ?? -
The harder issue is that I'm getting generated and source files included in the wheel, e.g. for a Cython module:
- The *.pyx file
- The *.cpp file
- Then the *.so -- which should be the only one in the wheel, yes?
Maybe I haven't found the right docs yet -- if so, point me to them!
However, a compiled-code plus Python project is pretty common -- I know I almost always have at least a bit of Python calling into my compiled code.
For pure python files, it's just copying the files. See wheel.packages for how to copy files directly, but that's only 1 approach. A more python build backend agnostic way is to design the CMake build system as complete as possible.
- The harder issue is that I'm getting generated and source files included in the wheel, e.g. for a Cython module:
- The *.pyx
- The *.cpp file
- Then the *.so -- which should be the only one in the wheel, yes?
I think the key point here is, is your code to be compiled structured in a CMake. If so, you are 90% there, otherwise consider making the CMake structure and go from there, or use other build backend like meson. In that process you are defining the libraries and where they need to be installed, and the python files installation is defined in there. I believe it's more helpful to get familiar with the CMake build system first, than the question becomes, how do I change the install path so that the python build system puts it in the correct place.
- A HUGE amount of extra cruft being installed with the package.
Are you referring to the sdist (git archive source files) or the wheel (pre-built binaries). If it's the latter, than indeed that's a problem, and it would be useful to know how you got there. For the former, please keep it as big and complete as possible, downstream packagers realllly need those.
We have an experimental hatchling plugin, you could try that if you want. The Python part of scikit-build-core is designed to look like hatchling, though we are currently missing "force-include". The package copy system is very similar.
For pure python files, it's just copying the files. See
wheel.packages
If you editable install, it makes symlinks, which is better than if you put the Python file copying into CMake. If you do need something more complex, though, it's fine to use CMake to copy files.
The SDist is everything not in .gitignore by default. If generated files are sneaking in, add them to .gitignore (currently top-level only, #726 will be worked on soon! Free-threading work has been distracting us from working on some of these issues). If you have something that doesn't need to be in the SDist, you can manually exclude them (but docs & tests should be there for downstream packagers!).
The wheel is everything in the list of packages (which defaults to the name of the package, possibly prefixed by src/ or python/) and everything CMake installs. You can use wheel.exclude to filter more items out. If you put the Cython files in the source (which is a pretty common Cython pattern), then ["*.pyx", "*.cpp"] is probably a good wheel.exclude to have[^1].
[^1]: You might be able to keep the .cpp files out of the filesystem by making sure it's only generated in the build directory.
Sample-projects probably needs contributions. :) Also fine with docs contributions. We've kind of kept Cython out of the docs assuming that everything would need to be rewritten after https://github.com/scikit-build/cython-cmake is ready and proposed to Cython proper.
ala setuptools find_packages, or ??
This is MANIFEST.in in setuptools, FYI. Which is also a long list of include/excludes.
Also, on rereading I think the main issue is your nested gitignores are not being respected; if you could use sdist.exclude for just a bit longer, the nested gitignore issue should be a target for the next release.