setuptools icon indicating copy to clipboard operation
setuptools copied to clipboard

Updating MANIFEST.in does not correctly update the package sdist creates

Open ghost opened this issue 10 years ago • 22 comments

Originally reported by: spookylukey (Bitbucket: spookylukey, GitHub: spookylukey)


The behaviour of sdist depends on previous contents of MANIFEST.in, not just the current. This is not fixed even by running setup.py clean or setup.py clean --all (although this should not be necessary).

This is very surprising behaviour, and potentially dangerous too - if someone accidentally adds a MANIFEST.in rule that includes a file that must not be distributed and notice the problem, they would expect that removing the rule will remove the file, but it does not.

I've attached a bash script that demonstrates the problem.


  • Bitbucket: https://bitbucket.org/pypa/setuptools/issue/436

ghost avatar Sep 14 '15 09:09 ghost

I have been hit by this same bug. My current remedy is to delete the *.egg-info directory before running setup.py sdist.

miccoli avatar Feb 27 '17 13:02 miccoli

The problema appears to be in the logic with which the SOURCES.txt file is updated by python setup.py egg_info:

$ python setup.py egg_info
running egg_info
writing sample.egg-info/PKG-INFO
writing dependency_links to sample.egg-info/dependency_links.txt
writing entry points to sample.egg-info/entry_points.txt
writing requirements to sample.egg-info/requires.txt
writing top-level names to sample.egg-info/top_level.txt
reading manifest file 'sample.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'sample.egg-info/SOURCES.txt'

Files already in SOURCES.txt are preserved across subsequent runs of python setup.py egg_info. Could please one of the developers clarify if this is intended behaviour and why?

miccoli avatar Feb 27 '17 14:02 miccoli

See also https://github.com/pypa/setuptools/blob/d4c215a7c61fb1f94b88bd2aa0b332ebaff18193/setuptools/command/egg_info.py#L560-L570 where the actual reading of SOURCES.txt takes place:

        rcfiles = list(walk_revctrl())
        if rcfiles:
            self.filelist.extend(rcfiles)
        elif os.path.exists(self.manifest):
            self.read_manifest()

I cannot understand the reason for this logic.

miccoli avatar Feb 28 '17 15:02 miccoli

@jaraco I think I nailed down this old (annoying bug), but nobody cares of these old bugs. Should I reopen or provide a pull request?

miccoli avatar Apr 12 '17 09:04 miccoli

@miccoli: If this ticket describes the issue, it's still open. A PR would be most appreciated.

jaraco avatar Apr 12 '17 22:04 jaraco

@jaraco yes: this ticket is accurate. Since there is no discussion active, I assume that I should provide my own PR.

miccoli avatar Apr 13 '17 09:04 miccoli

I think this change may have broken sdist installs with package data: #1016

PiDelport avatar Apr 18 '17 12:04 PiDelport

The reason for

        rcfiles = list(walk_revctrl())
        if rcfiles:
            self.filelist.extend(rcfiles)
        elif os.path.exists(self.manifest):
            self.read_manifest()

is now clear: in the develop tree (under SCM control) the files to be installed are determined from the SCM system (walk_revctrl()); in an sdist install the files are to be determined from the existing manifest file, which was previously generated in the develop tree + setuptools_scm

This logic is broken for packages that do not use setuptools_scm, for which there is no guarantee that the current manifest file (in the development tree) is correct.

Unfortunately my PR #1014 badly brokes this intended behaviour, so that pip install of sdist packages under SCM control and include_package_data=True is not more possible, see #1016 .

miccoli avatar Apr 18 '17 14:04 miccoli

Is there a fix to this now? Even if I delete <package>.egg_info, w hen I update MANIFEST.in and run python setup.py sdist, a new <package>.egg_info was created and the content of <package>.egg_info/SOURCES.txt was still not updated.

Firenze11 avatar Apr 17 '18 04:04 Firenze11

@Firenze11 unfortunately my previous attempt at solving this bug (PR #1014) was catastrophic. (I'm still a little ashamed of the mess I made.)

But from my analysis the only place where previous content of MANIFEST.in is persisted is <package>.egg_info/SOURCES.txt. In my use cases, if you delete <package>.egg_info/SOURCES.txt and run python setup.py egg_info you get a brand-new SOURCES.txt file which is correctly populated, following the usual setuptools logic and the current MANIFEST.in.

Can you please provide an example in which SOURCES.txt is first deleted and when recreated still contains files included in a previous MANIFEST.in?

miccoli avatar Apr 19 '18 22:04 miccoli

I really think SOURCES.txt should be cleared automatically at some point. This is very surprising behavior. I've been trying to figure out why the changes I was making to MANIFEST.in didn't seem to be working.

rsyring avatar Dec 21 '18 02:12 rsyring

Hi I am running into the same issue, but I haven't quite figured out how to get only the code in the package "src" folder be added to the final package.

@miccoli @rsyring could you help me with a bit more explanations? I don't have a MANIFEST.in file and I am not quite sure what steps I should follow to have python setup.py sdist create the package only with the desired files.

Thanks a lot for the help!

lucacerone avatar Apr 21 '20 21:04 lucacerone

@lucacerone This issue is specific to a setup with MANIFEST.in: please first check the docs at https://packaging.python.org/guides/using-manifest-in/ and see if following those instruction you are able to obtain the desired source distribution.

The only thing that you should be aware, regarding this specific bug, is that after each change to MANIFEST.in you should delete the <package>.egg-info directory to be sure that the SOURCES.txt is regenerated appropriately.

miccoli avatar Apr 24 '20 15:04 miccoli

Thanks a lot @miccoli, I managed to get the "data" in my package by adding a Manifest.in.

I don't quite understand what happens to "data" that resides outside of the "package" folder (I know it's not the preferred way, and in the end I moved the folder within the package), because if I add it to the Manifest.in, in the resulting archive I see those folders, but then they don't get copied to the site-packages folder (or any other folder I looked).

I think the documentation could be a bit more clear on what's going on under the hood, but in the end I found a way that works for me :)

Many thanks for your answer, I really appreciated it!

lucacerone avatar Apr 24 '20 17:04 lucacerone

@miccoli I've also encountered this without a MANIFEST.in, see the repro case in https://github.com/pypa/setuptools/issues/2347

tekumara avatar Aug 30 '20 23:08 tekumara

For me this

The only thing that you should be aware, regarding this specific bug, is that after each change to MANIFEST.in you should delete the .egg-info directory to be sure that the SOURCES.txt is regenerated appropriately.

from @miccoli works fine 😉.

alena-bartosh avatar Aug 10 '21 15:08 alena-bartosh

So annoying this bug... I tried to delete the .egg-info directory as @miccoli suggested but I can not update the files included in my python package by updating the MANIFEST.in. Even if I delete the whole MANIFEST.in file, when I rerun "$ python setup.py sdist" the SOURCE.txt include all the old files I included in the MANIFEST.in.

Is not there any fix for this bug yet?

MangelFdz avatar Jan 18 '22 16:01 MangelFdz

@MangelFdz do you have a reproduction? This does not seem to be quite the same problem described previously in this issue (for that problem things should work when you remove the .egg-info/build directories).

abravalheri avatar Jan 18 '22 16:01 abravalheri

I have this issue. I have fallen back to just adding back in empty directories corresponding to those that were once in the MANIFEST.in

Not a great solution, but moving on I guess.

microprediction avatar Feb 02 '22 18:02 microprediction

I can confirm that deleting the .egg_info directory helps. I thought I was going insane. It kept adding files even though I removed those from the manifest...

mxmlnkn avatar Jul 11 '22 17:07 mxmlnkn

I have been hit by this same bug. My current remedy is to delete the *.egg-info directory before running setup.py sdist.

Thx! That really solved the problem.

Skylark0924 avatar Jul 03 '24 10:07 Skylark0924

I can confirm the issue still exists, even when building using the PEP 517 builder:

 draft @ cat > MANIFEST.in
include foo*.txt
include bar*.txt
 draft @ cat > pyproject.toml
[build-system]
requires=['setuptools']
build-backend='setuptools.build_meta'
 draft @ touch foo1.txt
 draft @ touch bar1.txt
 draft @ pyproject-build -s .
* Creating isolated environment: venv+pip...
* Installing packages in isolated environment:
  - setuptools
* Getting build dependencies for sdist...
running egg_info
writing UNKNOWN.egg-info/PKG-INFO
writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
writing top-level names to UNKNOWN.egg-info/top_level.txt
reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
* Building sdist...
running sdist
running egg_info
writing UNKNOWN.egg-info/PKG-INFO
writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
writing top-level names to UNKNOWN.egg-info/top_level.txt
reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

running check
warning: check: missing required meta-data: name

creating unknown-0.0.0
creating unknown-0.0.0/UNKNOWN.egg-info
copying files to unknown-0.0.0...
copying MANIFEST.in -> unknown-0.0.0
copying bar1.txt -> unknown-0.0.0
copying foo1.txt -> unknown-0.0.0
copying pyproject.toml -> unknown-0.0.0
copying UNKNOWN.egg-info/PKG-INFO -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/SOURCES.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/dependency_links.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/top_level.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/SOURCES.txt -> unknown-0.0.0/UNKNOWN.egg-info
Writing unknown-0.0.0/setup.cfg
Creating tar archive
removing 'unknown-0.0.0' (and everything under it)
Successfully built unknown-0.0.0.tar.gz
 draft @ rm -r dist
 draft @ cat > MANIFEST.in
include foo*.txt
 draft @ pyproject-build -s .
* Creating isolated environment: venv+pip...
* Installing packages in isolated environment:
  - setuptools
* Getting build dependencies for sdist...
running egg_info
writing UNKNOWN.egg-info/PKG-INFO
writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
writing top-level names to UNKNOWN.egg-info/top_level.txt
reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
* Building sdist...
running sdist
running egg_info
writing UNKNOWN.egg-info/PKG-INFO
writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
writing top-level names to UNKNOWN.egg-info/top_level.txt
reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

running check
warning: check: missing required meta-data: name

creating unknown-0.0.0
creating unknown-0.0.0/UNKNOWN.egg-info
copying files to unknown-0.0.0...
copying MANIFEST.in -> unknown-0.0.0
copying bar1.txt -> unknown-0.0.0
copying foo1.txt -> unknown-0.0.0
copying pyproject.toml -> unknown-0.0.0
copying UNKNOWN.egg-info/PKG-INFO -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/SOURCES.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/dependency_links.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/top_level.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/SOURCES.txt -> unknown-0.0.0/UNKNOWN.egg-info
Writing unknown-0.0.0/setup.cfg
Creating tar archive
removing 'unknown-0.0.0' (and everything under it)
Successfully built unknown-0.0.0.tar.gz
 draft @ ls -la
total 16
drwxr-xr-x   8 jaraco  staff   256 Jul 11 01:03 .
drwxr-x---+ 60 jaraco  staff  1920 Jul 10 22:21 ..
-rw-r--r--   1 jaraco  staff    17 Jul 11 01:03 MANIFEST.in
drwxr-xr-x   6 jaraco  staff   192 Jul 11 01:03 UNKNOWN.egg-info
-rw-r--r--   1 jaraco  staff     0 Jul 11 01:02 bar1.txt
drwxr-xr-x   3 jaraco  staff    96 Jul 11 01:03 dist
-rw-r--r--   1 jaraco  staff     0 Jul 11 01:02 foo1.txt
-rw-r--r--   1 jaraco  staff    77 Jul 11 01:02 pyproject.toml
 draft @ rm -r UNKNOWN.egg-info/
 draft @ pyproject-build -s .
* Creating isolated environment: venv+pip...
* Installing packages in isolated environment:
  - setuptools
* Getting build dependencies for sdist...
running egg_info
creating UNKNOWN.egg-info
writing UNKNOWN.egg-info/PKG-INFO
writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
writing top-level names to UNKNOWN.egg-info/top_level.txt
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
* Building sdist...
running sdist
running egg_info
writing UNKNOWN.egg-info/PKG-INFO
writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
writing top-level names to UNKNOWN.egg-info/top_level.txt
reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

running check
warning: check: missing required meta-data: name

creating unknown-0.0.0
creating unknown-0.0.0/UNKNOWN.egg-info
copying files to unknown-0.0.0...
copying MANIFEST.in -> unknown-0.0.0
copying foo1.txt -> unknown-0.0.0
copying pyproject.toml -> unknown-0.0.0
copying UNKNOWN.egg-info/PKG-INFO -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/SOURCES.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/dependency_links.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/top_level.txt -> unknown-0.0.0/UNKNOWN.egg-info
copying UNKNOWN.egg-info/SOURCES.txt -> unknown-0.0.0/UNKNOWN.egg-info
Writing unknown-0.0.0/setup.cfg
Creating tar archive
removing 'unknown-0.0.0' (and everything under it)
Successfully built unknown-0.0.0.tar.gz

I don't know why I thought that build would always build from a copy of the sources and not the actual directory.

I think what I'd like to see is for setuptools not to write an .egg-info at all when doing a build. I know it's unnecessary in general (not required by the specs or created by other backends). I wonder if there are any setuptools-specific behaviors that depend on that egg-info directory (in the working directory or the sdist).

jaraco avatar Jul 11 '24 05:07 jaraco

I can confirm this bug still exists in setuptools 80.9.0.

As you can see in the output below: When running the egg_info command a second time, the file egg_info/SOURCES.txt is read as a manifest file. Thus all files which have been included in the last run, will be included again – even if MANIFEST.in no lpnger lists them.

Attached please find a shell-script demonstrating the issue test-st-bug-436.txt and below please find a (shortened, pseudo) output from this script. The script is based on the "details" from the last comment, but using plain setuptools.

====Preparation====
+ pip show setuptools
Name: setuptools
Version: 80.9.0
…
+ rm -rf /tmp/st-bug-436
+ mkdir /tmp/st-bug-436
+ cd /tmp/st-bug-436
+ echo >> MANIFEST.in 'include aaa*.txt'
+ echo >> MANIFEST.in 'include bbb*.txt'
+ echo aaaaa > aaa1.txt
+ echo bbbbb > bbb1.txt

+ cat > pyproject.toml << EOF
[build-system]
requires=['setuptools']
build-backend='setuptools.build_meta'
EOF

==== build egg_info/SOURCES.txt ====
+ python -c 'from setuptools import * ; setup()' egg_info
running egg_info
…
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
+ cat UNKNOWN.egg-info/SOURCES.txt
…
aaa1.txt
bbb1.txt
…

==== bbb1.txt is included, okay ===

======== Now demonstrating the bug =========
=== Write new MANIFEST.in not containing bbb*.txt ===
+ echo > MANIFEST.in 'include aaa*.txt'

==== build egg_info/SOURCES.txt ====
+ python3.10 -c 'from setuptools import * ; setup()' egg_info
running egg_info
…
reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
+ cat UNKNOWN.egg-info/SOURCES.txt
…
aaa1.txt
bbb1.txt
…

==== bbb1.txt is still included, BUG ====

htgoebel avatar Jul 30 '25 19:07 htgoebel