setuptools
setuptools copied to clipboard
package data in subdirectory causes warning
setuptools version
62.3.2
Python version
3.10
OS
Debian with conda
Additional environment information
No response
Description
pyopencl has OpenCL files and some headers in a subdirectory pyopencl/cl and they are included as package_data so that the python module can find them.
package_data={
"pyopencl": [
"cl/*.cl",
"cl/*.h",
"cl/pyopencl-random123/*.cl",
"cl/pyopencl-random123/*.h",
]
},
With new setuptools, there is a warning saying
############################
# Package would be ignored #
############################
Python recognizes 'pyopencl.cl' as an importable package, however it is
included in the distribution as "data".
This behavior is likely to change in future versions of setuptools (and
therefore is considered deprecated).
Please make sure that 'pyopencl.cl' is included as a package by using
setuptools' `packages` configuration field or the proper discovery methods
(for example by using `find_namespace_packages(...)`/`find_namespace:`
instead of `find_packages(...)`/`find:`).
You can read more about "package discovery" and "data files" on setuptools
documentation page.
cc @inducer
Expected behavior
No warning
How to Reproduce
- clone https://github.com/inducer/pyopencl
- install numpy
- Run
python setup.py install
Output
$ python setup.py install
running install
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing pyopencl.egg-info/PKG-INFO
writing dependency_links to pyopencl.egg-info/dependency_links.txt
writing requirements to pyopencl.egg-info/requires.txt
writing top-level names to pyopencl.egg-info/top_level.txt
reading manifest file 'pyopencl.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'pyopencl.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/build_py.py:153: SetuptoolsDeprecationWarning: Installing 'pyopencl.cl' as data is deprecated, please list it in `packages`.
!!
############################
# Package would be ignored #
############################
Python recognizes 'pyopencl.cl' as an importable package, however it is
included in the distribution as "data".
This behavior is likely to change in future versions of setuptools (and
therefore is considered deprecated).
Please make sure that 'pyopencl.cl' is included as a package by using
setuptools' `packages` configuration field or the proper discovery methods
(for example by using `find_namespace_packages(...)`/`find_namespace:`
instead of `find_packages(...)`/`find:`).
You can read more about "package discovery" and "data files" on setuptools
documentation page.
!!
check.warn(importable)
running build_ext
Hi @isuruf thank you for bringing this subject for discussion.
This change (and the warning) was intentional, so in the future we will be able to solve a related problem with find_packages(..., exclude=...) and include_package_data.
Since PEP 420, there is no way for really differentiating a folder from a package in Python... People that install pyopencl will be able to successfully run import pyopencl.cl, which means that the interpreter does consider properly.cl a package. Nevertheless configuration file seems to want to consider cl a "data directory", which is not currently a well stablished concept in the ecosystem...
Would it possible for you guys to use find_namespace_packages() as indicated in the warning message?
I'm facing the same problem, and I'm not clear on the nature of the change suggested by @abravalheri. Are you saying that directories in the package hierarchy that contain only data files and no Python code should be included in the project's list of packages despite not actually being Python packages?
Hi @jwodder, thank you very much for the input. Please find my comments bellow:
directories in the package hierarchy that contain only data files and no Python ... not actually being Python packages
I think that this assumption[^1] no longer holds.
According to PEP 420, package is a term that "refers to Python packages as defined by Python’s import statement".
An evidence that a directory without Python code is treated by the Python import statement as a package is given by the following snippets:
rm -rf /tmp/example
mkdir -p /tmp/example/mypkg1
touch /tmp/example/mypkg1/file.txt
mkdir -p /tmp/example/mypkg2/subpkg
touch /tmp/example/mypkg2/__init__.py
touch /tmp/example/mypkg2/subpkg/file.txt
python3.10
>>> import sys
>>> sys.path.append("/tmp/example")
>>> import mypkg1
>>> mypkg1
<module 'mypkg1' (<_frozen_importlib_external._NamespaceLoader object at 0x7f3683fac2e0>)>
>>> mypkg1.__path__
_NamespacePath(['/tmp/example/mypkg1'])
>>> mypkg1.__loader__.is_package('mypkg1')
>>> import mypkg2.subpkg
>>> mypkg2.subpkg.__path__
_NamespacePath(['/tmp/example/mypkg2/subpkg'])
>>> mypkg2.subpkg.__loader__.is_package('mypkg2.subpkg')
True
Note in this example that both mypkg1 and mypkg2.subpkg can be imported, and are treated as regular packages (both are imported as module objects with the attribute __path__ set), despite not containing any Python file.
I added a mypkg.subpkg to the example to demonstrate that it does not matter whether or not the directory is nested inside another "traditional package".
Moreover, the ecosystem also considers those folders, as demonstrated by the implementation of importlib.resources:
>>> import importlib.resources
>>> importlib.resources.contents('mypkg1')
['file.txt']
>>> importlib.resources.contents('mypkg2.subpkg')
['file.txt']
I understand that not all tools in the ecosystem embrace PEP 420 by default and that sometimes it might be frustrating to see tools evolving some interfaces evolving. That is the reason why I decided to add a deprecation warning first and a transition period, instead of simply changing things in a major version bump.
This change in setuptools was implemented due to a single, very pragmatic, objective: to fix internal inconsistencies (as presented in #3260), which are caused by this division between packages and directories that don't contain Python code (which is not a division recognized by Python's import system).
For the time being, I don't plan on removing the deprecation warning, since the only way I know how to fix the internal inconsistencies requires us to eliminate this arbitrary division. However if anyone in the community finds a different way of solving the problem and are willing to provide a backward-compatible PR, I am more than happy to consider an alternative.
[^1]: Namely the assumption that a directory with no Python code is not a package.
@abravalheri So you are saying that "data-only packages" should be included in the project's list of packages? Is that the only change you're currently recommending to authors of such projects? Does it matter whether the data files are included via package_data or MANIFEST.in+include_package_data?
Yes, I recommend adding all the sub-directories to the list of packages, even if they only include data files. On the bright side that can be done with find_namespace_packages (setup.py) or find_namespace: (setup.cfg).
It should make no difference if you are using package_data or MANIFEST.in (although I am not sure if users are going to see the warning if they don't use include_package_data=True).
Do you have a suggestion about how to improve the warning message to make it more clear?
@abravalheri I find the first paragraph of the warning to be a bit confusing. I would rewrite it from:
Python recognizes '...' as an importable package, however it is included in the distribution as "data". This behavior is likely to change in future versions of setuptools (and therefore is considered deprecated).
to something more like:
Python recognizes '...' as an importable package, but it is not listed in "packages". It is included in the distribution because it contains package data, but this behavior is likely to change in future versions of setuptools (and therefore is considered deprecated).
I was following this issue because I was very confused (as a newbie) by the warning that came up. Can I suggest a further edit to the warning message: Currently {importable!r} is only added to the distribution because it may contain data files... --> Currently {importable!r} has been automatically added to the distribution because it may contain data files... This is my understanding of what is actually happening (i.e. automatic inclusion). If this suggested change is wrong, then I'm still not following how this works...
Thank you very much @cdfarrow, this suggestion looks good to me. I will implement it probably later this week.
How about the case where we want *.py files in a subdirectory included as data files? This occurs in a couple projects for me that define a plug-in interface. Basically there are a couple python files in a package subdirectory that I provide as examples of plug-in implementation. These are also loaded through the common plug-in interface via importlib.
Will the inclusion as a namespace package still work in this case? Anything special to be aware of?
If it makes a difference, I'm using the MANIFEST.in method with include_package_data=True.
Also, I created this SO question about this issue when I first encountered it. I guess I should have come here first! :)
Hi @mhkline. The warning is not about the files themselves, but about the directory. Right now there is no concept of a "data directory" for the package ecosystem.
Since PEP 420, effectively all directories are packages regardless of containing a __init__.py file or not. With this warning, my intention is to align the expectations of the users with the behaviour we observe in Python.
If you want the directory to be included in the distribution, you can include it via the packages= configuration. find_namespace_packages() in setup.py or find_namespace: in setup.cfg will do that for you, and probably make the warning go away.
The above discussion covers how to go about building a correct distribution without using deprecated setuptools functionality. Thanks all for that!
My question here is more philosophical than practical. It seems like setup(packages=...) and MANIFEST.in are at least partially overlapping in functionality. In our MANIFEST.in we specify exactly what files and directories to include in distribution bundles. It seems redundant to have to specify similar information (the directories part) via the packages= configuration. Couldn't that list of packages theoretically be inferred from the MANIFEST.in contents?
Hi @kanderso-nrel , you can remove the packages= configuration and setuptools will try to infer which packages are provided.
There are a few limitations:
-
if you explicitly provide
packages=orpy_modules=this "autodiscovery/automatically inference" feature will be turned off. The same thing happens if yoursetup.pyspecifiesext_modules=. -
If you are using a flat-layout and have multiple folders at the root of your directory (other than the conventional
docs/tests), some of these folders might be accidentally recognised as namespace packages. For this reason setuptools will halt if more than one top-level package is automically inferred. Projects using the src-layout should be safe in the majority of the cases.
Yes, I recommend adding all the sub-directories to the list of packages, even if they only include data files.
I find this unexpected and unintuitive. Why make a namespace package when you have no need or intent to use it as a namespace package? IMO cleaner recommendations could be:
- have data files inside an existing package (
pyopencl/example.c) - add
__init__.pyto mark the dir as a regular package (pyopencl/cl/__init__.py+pyopencl/cl/example.c) - support data files in subdirectories (
pyopencl/cl/example.c) without considering them namespace packages (probably not possible without defining a new concept for package data directory)
This topic may be worth discussing with other tools maintainers on the packaging forum.
Why make a namespace package when you have no need or intent to use it as a namespace package?
Hi @merwok , since the Python runtime will effectively consider any directory without a __init__.py a namespace package, if a dev organizes files in such a directory, the dev is effectivelly making use of a namespace package... If a dev have a use for directories without a __init__.py, the dev will automatically have an use for a namespace package...
Am I understanding corectly here? Do you object about listing existing namespace packages that already exist in a project in the packages configuration?
My suggestion was to simply align the configuration to the way Python effectively interprets things...
(Of course if the community wants to change how this works a discussion under the Packaging topic in the Python discourse is always a good approach).
My point is that the dev is not making use of a namespace package because it is not intended that pyopencl.cl can be extended by other distributions providing modules in that namespace. It really is a directory containing data files.
I'm faced with the inverse situation: my packages= and MANIFEST.in are well defined, with exactly the files I want included in my wheel. There are files within the package that I don't want to see included (e.g. tests, sass files, non minified js, etc.). Now, setuptools adds them to the wheel, with no "Stop doing this, I know what I'm doing" flag that I can see. Am I missing something?
I get the same behavior if, instead of specifying packages, I try to find then exclude the unwanted packages. :cry:
Hi @olemoign, can you share a link to your project source code (or a similar reproduction)? I would need to have a look on what is the configuration/file structure to understand what is going on. In theory if well specified MANIFEST.in and [options] packages= should prevent files from being added...
You can open a new discussion topic with the extra information to avoid "overloading/hijacking" this issue thread.
@abravalheri I stumbled on a bunch of build warning burried in CI logs that I was reviewing randomly and I am puzzled..... can you articulate what end-user benefit do you expect with this change? (e.g. package maintainers that rely on setuptools) ?
Personally I do not think such as warning can be easily seen. My wheels contains thousands of files and the warnings are just drowned in CI log files never looked at unless the build fails. So I am not convinced that this warning would have much effect.
You wrote:
Right now there is no concept of a "data directory" for the package ecosystem.
IMHO the current behaviour is the de-facto way that package maintainers understand and have grown to rely on. e.g. when you "include_package_data" anything (file or dir) in the tree of included packages is included.
Since PEP 420, effectively all directories are packages regardless of containing a init.py file or not. With this warning, my intention is to align the expectations of the users with the behaviour we observe in Python.
What is the Python behaviour there beyond the fact that files in the package tree are accessible? I could not find anything about data files or data directories mentioned in PEP 420.
Now, the proposed future behaviour does not seem entirely consistent: when there are data files in a directory with Python code (either a legacy init-style or "namespace" package) these are included but data files in a subdirectory of the same would be not included, e.g., some data files would need an intervention and some data files would not? Unless a subdir of a package dir is not a Python identifier (e.g. with a dash as in "foo-bar"), and then this is included without warning.
So if I understand the to-be behaviour correctly based on deprecation messages this would mean this (assuming in all cases that include_package_data is True):
- plain data files under a legacy or namespace package directory are included
- directories with a non-valid Python identifier name under a legacy or namespace package directory are included
- directories with a valid Python name under a legacy or namespace package directory will not be included and would require special treatment (e.g. adding an
__init__.py) or a declaration such that are treated as namespace packages.
I am not sure that this would contribute to a better and consistent user experience.
can you articulate what end-user benefit do you expect with this change? (e.g. package maintainers that rely on setuptools)?
Right now the configurations debated here work in a "grey area" and mess with the core Python concept of a package. By changing it, I hope to make things easier to understand, obvious and more well-defined. I also plan to make it possible for users to do things like packages=find_packages(exclude=["mypkg.tests*"]) while introducing a more clear behaviour of include_package_data=True.
IMHO the current behaviour is the de-facto way that package maintainers understand and have grown to rely on. e.g. when you "include_package_data" anything (file or dir) in the tree of included packages is included.
I am afraid that is not a given. Some package maintainers may understand things in this way, but the reality is that it is confusing and that there is not such a concept in the Python ecosystem of a "data directory". All directories are packages, and setuptools does have a configuration that captures this concept: packages. If a directory does not have a corresponding entry in packages, this should mean that the developer does not want that package to be included in the distribution.
What is the Python behaviour there beyond the fact that files in the package tree are accessible?
The Python behaviour is to consider such directories packages. You can even import them.
Now, the proposed future behaviour does not seem entirely consistent: when there are data files in a directory with Python code (either a legacy init-style or "namespace" package) these are included but data files in a subdirectory of the same would be not included, e.g., some data files would need an intervention and some data files would not?
I am failing to see the inconsistency. If we stop thinking for a moment in terms of "sub-directories", we can see that when include_package_data is True, all non-Python files that inhabit a package listed in the packages configuration will be included. If your "sub-directory" has a corresponding entry in packages, its files should be included also. Sub-directories are only different packages.
Unless a subdir of a package dir is not a Python identifier (e.g. with a dash as in "foo-bar"), and then this is included without warning.
To be sincere, I wanted also to have a warning for those cases. But I cannot sustain the argument that those packages should also be listed in the packages configuration, because they are not "importable" using an from ... import <name of the directory here> statement. Maybe in the future we can also make this change based on the fact that they are importable with importlib.import_module, but this is a weaker argument.
So if I understand the to-be behaviour correctly based on deprecation messages this would mean this (assuming in all cases that include_package_data is True) ...
I think it is easier to think in the following terms:
- If you can successfully execute an
from ... import <name of the directory here>statement, you should include a corresponding entry in thepackagesconfiguration unless you don't want that directory to be part of the distribution. Or leave it up for setuptools to perfom auto-discovery.
I am not sure that this would contribute to a better and consistent user experience.
Maybe we disagree here, but my intention is to have less "grey area" and make the configuration more obvious. I argue that requiring a corresponding entry in packages for all the directories that can be successfully imported is clearer and more predictable.
The key take away is that the existing configuration mechanisms for package/package data work in non-obvious ways and mix a lot of concepts. The intention of the change is to better delineate the contours of these concepts and make things more obvious.
The only way that I can think of to make this happen without completely changing the configuration system is to improve the boundaries between the package and the include_package_data configurations.
Alternatively, we could revamp the configuration system and think about other concepts that could be easier to deal with. I am very receptive of this idea, but it is not in my priorities right now. If anyone else would like to champion this, I think it would be a positive change.
@abravalheri
...I also plan to make it possible for users to do things like packages=find_packages(exclude=["mypkg.tests*"]) while introducing a more clear behaviour of include_package_data=True.
This is the situation I'm in. I've made an example repo here. We have tests inside the package, and want to (conditionally) exclude it from distribution. But seems patterns in exclude are ignored anyway getting the aforementioned warning and tests package is added.
Practically, we're evaluating moving tests out next to package and conditionally including them with some extra logic in setup.py, but curious if it's worth the effort.
Edit: clarify this is a bit confusing, due to the example of package discovery, specifically of this use case
@milesgranger you are correct. This is a bug (https://github.com/pypa/setuptools/issues/3260).
The problem is that we cannot resolve this bug without first deprecating and removing the behaviour described in this issue (you can see that there is a lot of people depending on it yet...).
I suppose you can have a workaround by one of the following:
-
Set
exclude_package_datato remove all files in the tests folder or -
Set
include_package_data=Falseand addpackage_datawith more specific file patterns.
Sorry for the trouble, if we change things right now, several projects in the ecosystem might break (so we have to go through the deprecation period).
@abravalheri thanks for the response! That are useful workarounds
I just wanted to give some feedback on the current warning about what caused us some confusion (colleague of @milesgranger, so based on the same example from the above comment). So we run into the issue that find_(namespace)_packages(exclude=["mypkg.tests*"]) doesn't work if also including package data. I certainly understand the reason for this now, but so the warning we currently see:
Python recognizes 'package.tests' as an importable package, but it is not listed in the
packagesconfiguration of setuptools.'package.tests' has been automatically added to the distribution only because it may contain data files, but this behavior is likely to change in future versions of setuptools (and therefore is considered deprecated).
Please make sure that 'package.tests' is included as a package by using the
packagesconfiguration field or the proper discovery methods (for example by usingfind_namespace_packages(...)/find_namespace:instead offind_packages(...)/find:).
A few confusing aspects:
- The message suggests to use
find_namespace_packagesinstead offind_packages, but you also get the same warning if you actually usefind_namespace_packages(and using that also doesn't fix the specific issue we have) - The text 'package.tests' has been automatically added to the distribution only because it may contain data files is also confusing to me, because if using standard discovery without
include_package_data=True, this module would actually be included as well simply because it is a python module. So while I understand that because ofinclude_package_data=True, this directory would also be included if it wasn't a python module, the current text is quite confusing if the directory at hand is a python module. - The suggestion about how to solve this warning is only about how you can ensure to include the package, so that if the behaviour would change in the future to no longer include this directory, your package is not broken. It might be useful to add some text about what to do if you do not want to include the directory (now and in the future).
(and to be clear: I am a package author myself having to deal with trying to write informative warnings regularly, I certainly know it is not always easy or even possible to provide the perfect warning for each and every use case that might trigger it .. ;))
Thank you very much for the feedback @jorisvandenbossche. Indeed the scenario you are describing is very trick. The deprecation was introduced to specifically solve this problem, but we have to wait the deprecation period before the problem can be 100% solved... (backward bug-compatibility).
Let me try to clarify somethings in your last post, and then maybe we can work in a better warning message togheter?
The message suggests to use find_namespace_packages instead of find_packages, but you also get the same warning if you actually use find_namespace_packages (and using that also doesn't fix the specific issue we have)
Your case is special because the package package.tests is intentionally excluded, this warning message should not apply to you.
In your case something like the following should be better:
**NOTE**: If you are already explicitly excluding `package.tests` via
`find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
you can try instead to use `include-package-data=False` in combination with
a more fine grained `package-data` configuration. This limitation will likely
to be solved in the future once the deprecation period is over.
Would adding this paragraph help?
this module would actually be included as well simply because it is a python module.
If you are setting exclude (and include-package-data=False), this module would not be included, right?
if the directory at hand is a python module.
All directories are always Python packages, even if they don't include a Python file.
In your case something like the following should be better: .. Would adding this paragraph help?
Yes, that would certainly help. Although in our case we resorted to using exclude_package_data, which might be a bit easier to do (as you just need to specify there what you also have in exclude of find_packages) than using include_package_data=False and having to update how actual data gets included.
this module would actually be included as well simply because it is a python module.
If you are setting
exclude(andinclude-package-data=False), this module would not be included, right?
In our case, yes, because we explicitly had exclude=["package.tests*"]. But when you don't use exclude, then this submodule would be included. Now, I see that in that case, you don't get this warning, so then indeed my point is moot (since there is no warning in that case, it can't cause the confusion that I imagined to have in that case ;))
Maybe part of the confusion also comes from "but it is not listed in the packages configuration of setuptools." in the first sentence. Maybe that also could include something like "not listed in .. or is excluded"? (I know that because it is excluded, it's not listed in the packages, but that might make it more obvious that this warning can have to do with trying to exclude a package)
if the directory at hand is a python module.
All directories are always Python packages, even if they don't include a Python file.
Yes, I understand that now. But I think that's not the mental model of most people reading this warning ..
BTW, it's also not fully clear to me what will be the future behaviour or what change. On the one hand the warning is saying that the package was included because it may contain data files and that this might change in the future (so in the future it might not be included by default?), while on the other hand you clearly say that any directory is a python module, so I would expect that in the future also any python module, i.e. any directory, would be automatically included?
I tried switching over to find_namespace_packages, seemed to work. Now in another project I'm noticing that the package build is discovering all sorts of random things in my checkout - things not even in my package-source entry point.
What tells setuptools to not have find_namespace_packages looking for an entire repository checkout for Python files?
edit: I am already explicitly excluding tests and tests.* (source) but it's not clear how to configure the right behavior here
I have spent a few minutes trying a src layout but I can't get an editable pip install to work.
edit:
pkg_name = "my-package"
# Obviously in "flat" layout you place files under the package's slug if its name includes a dash.
pkg_slug = pkg_name.replace("-", "_")
setup(
...
packages=setuptools.find_namespace_packages(
include=[pkg_slug, pkg_slug + ".*"],
exclude=["tests", "tests.*"],
),
...
)
Something like this appears to work for me. Where a typical project is structured like:
README.md
setup.py
my_package/
__init__.py
__main__.py
data/
data1.yaml
data2.yaml
...
tests/
__init__.py
test_entry.py
...
It's not really clear to me from the documentation what setuptools actually wants. I think where this software fails is in making it clearer what the nominal package structure is. It's nice that you can kind of do whatever you want and make it work but most of us shipping software out into the wild would rather just conform to something that "just works" and move on. I'm not really seeing that solution emerge out of the discussion or the documentation.
1: your project structure does not use src
2: you are missing pyproject.toml (I forget if pip requires it for editable installs)
3: you have a real package, why use find_namespace_packages?
3: you have a real package, why use find_namespace_packages?
@merwok Suggest reading the thread if this is not clear to you. Having data directories in a package without __init__.py files is deprecated (if you use the regular find_packages method). It's now required to use find_namespace_packages to include package data directories as packages to squelch a deprecation warning.
A setup.py file is generally the requirement for editable installs (why? no idea). I resolved my own issue but anticipate more in the future as the Python ecosystem bike-shedding continues.