setuptools_scm
setuptools_scm copied to clipboard
Add option to skip auto-inclusion of all tracked files.
After spending a long time trying to figure out why my distributions were suddenly too large to be uploaded to PyPi, I tracked it down to my recent inclusion of setuptools_scm. Many people may want to use setuptools_scm to automatically set the package version but may not want to override the default logic of which files are included in an sdist. Rather than requiring users to manually exclude all unwanted files in MANIFEST.in it would be nice to have a way of switching off this feature.
Indeed, this is hideous. Using MANIFEST.in instead of setuptools discovery and exclusion mechanisms is not only tedious, but quite unexpected. In that sense related: https://github.com/pypa/setuptools/issues/3260
@bertsky MANIFEST.in is the setuptools' exclusion mechanism. It controls what's included in sdists, which arguably should be everything because it should be possible to build in installables out of an sdist.
The declarative config you're implicitly referring to through the linked issue is for controlling what's included in wheels — the files that end up in site-packages/.
@webknjaz that is surprising (to say the least), and IMO directly contradicts the statements made in setuptools documentation:
Automatically include all relevant files in your source distributions, without needing to create a MANIFEST.in file, and without having to force regeneration of the MANIFEST file when your source tree changes [1].
The setuptools User Guide does not state anywhere that the package discovery and data file inclusion configs are only relevant for wheels. Furthermore, AFAICS this is also not what is implemented: If I python -m build ., then my source tarballs behave pretty much the same as my wheels regarding inclusion or exclusion of files.
@bertsky AFAIK only some recent setuptools versions started discovering some sunsets of files to include. I think that's only enabled with the PEP 621 metadata declaration method, plus perhaps have an src-layout. I don't think that's universal. Plus, lists of files to include in sdists and wheel should be different. Wheels contain everything that ends up in site-packages, that's what installers use. Sdists should contain at least everything needed to build wheels, they are never used for installation directly. But sdists are also used in other contexts — downstream redistributors use them as the source of truth, building RPMs out of them (through wheel) but also building the docs and running the tests. These things should not be included in wheels (because they shouldn't end up on the top level of site-packages/) but are very useful in sdists.
A lot of people expect sdists to be equivalent of Git checkout. For me, it's because I want my sdists to be downstream-friendly. So I even test them like that in CI, avoiding building wheels from Git in most cases. So putting everything Git-tracked into sdists makes sense to me, if there's some gigantic files that are not necessary for building wheels or downstream testing/docs, those could be excluded via the manifest as an exception, but in general I don't bother — the majority of people will only hit wheels and won't have to build from sdists.
FWIW I think in many cases it'd look like setuptools' autodiscovery behaves the same. This might be because of how building/installing from sdist works.
If you pip install some.tar.gz, it'll first build a wheel, cache it and unzip the wheel into site-packages.
Running python -m build emulates this in that it first builds an sdist from a source checkout (Git usually), and then it'll untar that into a temporary/disconnected location on disk, and build the wheel out of it.
If you start adding flags like --sdist/--wheel to that command, both builds will be performed from the Git checkout.
With that, if you forget to include an important file into sdist, your CI/release pipeline will build both and they will work in that setting because some extra files from the Git checkout happen to exist on disk at the time. But if anybody (end-users, downstreams etc.) attempts to install from such an sdist, they may end up with broken wheels or building might not even succeed.
That said, I haven't looked into what setuptools discovers today. I think it's nice that it exists for the first-time users of setuptools but I'd prefer to still have something that I can rely on consistently. And that's what this plugin does for me.
cc @abravalheri do you have any insight?
The way the setuptools.file_finders entry-point was designed many years ago is to always include all files yield by the plugin. So it is very hard to change that in a backwards compatible way without breaking the ecosystem.
Ideally exclude-package-data could work for that, but as pointed out in https://github.com/pypa/setuptools-scm/issues/516#issuecomment-2387263707, it is broken by https://github.com/pypa/setuptools/issues/3260. And that issue is also problematic to solve because there is a tug-of-war with https://github.com/pypa/setuptools/issues/3340.
So if you want to achieve finer selection of files, the existing approach is:
- Opt-out of
include-package-databy explicitly setting it to false - Use
package-datado explicit list the relevant files/globs.
Notes
- The use of hyphen or underscore for the configuration parameters depends if you are using
setup.cfg/setup.py(underscore) orpyproject.toml(hyphen). - It is also important to not forget that all directories are considered importable packages by the Python import machinery, regardless if they contain Python files or not.
So also ensure directories are listed by the
packagesconfiguration to avoid the warnings in https://github.com/pypa/setuptools/issues/3340. Most of the times, it is possible to completely omit the configurationpackagesparameter and use the automatic discovery available onsetuptools>61.2
Related:
We should be able to use setuptools_scm, but not use the version specifcation code.
Yes, you can currently not set version to setuptools_scm in pyproject.toml, but it still tries to figure out the version, whether you use it or not, and if your scm isn't tagged "correctly" it fails out with an error.
See: #873 for detail.
https://github.com/pypa/setuptools/pull/5056 is a starting point towards enabling this - i need to ensure our configuration is on the distribution and i can reuse it