perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

Create Porting/MANIFEST.dev as a complement to MANIFEST

Open demerphq opened this issue 3 years ago • 16 comments

This file is intended to list all the files in the repo which are not listed in the main MANIFEST file, and which are used only for development purposes, especially those files which are only useful when working in a git checkout of the main perl git repository.

The files it contains will NOT be added to the production tarball release. The file has the exact same format as the main MANIFEST: "file\t+description" or "file".

Q. Why didn't I call this Porting/MANIFEST as mentioned in the discussion thread that lead to this patch?

A. The main reason was that Porting/README.pod includes a list of files in Porting with descriptions and explanations for what the files do or how they are used. In several places the file refers to "MANIFEST", which lead to ambiguity that would have had to be resolved by changing all the entries to refer to "Porting/whatever" instead. It was much simpler to give the new file an extension, and I thought that '.dev' suggests it is for "development" purposes.

Q. Why isn't this using MANIFEST.skip style functionality?

A1. Various parts of our build and test process expect to read the MANIFEST file and then do things based on the entries contained within. Eg, run tests, or extract data, or compare the file list to content in another file. Those parts of our build process would break if we used a skip style list of regexen. So it would be more work to teach them to deal with such a file, assuming it was actually doable - given the additional work I have not considered it deeply. On the other hand teaching that logic to simply read two files was and is easy.

A2. I think each file we have in the repo should have a description. This patch currently doesn't provide a description for each, but it does for many, especially those migrated from MANIFEST.

A3. I think that MANIFEST.skip style files of exclusion regexens and globs are error prone and easy to mess up, for instance by excluding far more than you had intended to. They can also be annoying to get right, obviously not impossible, but sometimes annoying. Explicitly listing everything is easy in every way, especially to mechanize.

A4. I would like to be able to move verbatim entries from our existing MANIFEST into the new Porting/MANIFEST.dev, description and all. MANIFEST.skip style files do not support descriptions except as comments as far as I recall. That would have meant munging the data from MANIFEST during the move process which would be annoying.

A5. I would like to be able to reuse our sorting logic to keep the files nicely sorted in a way where the file is somewhat readable. A list of skip files would be less amenable to doing so.

Q. There is a lot of duplicated logic related to manifests, should we refactor it out into a module or some resuable tool set?

A. YES! We already have Porting/manifest_lib.pm, but it currently does not declare a package, and it only contains one function. Instead of adding yet more code that depends on requiring a file and having it inject subs into package main I decided that doing the refactoring could wait for a separate commit or PR. But I definitely think we should refactor as much of this logic as possible.

Q. Some of the test files were fairly significantly changed, are you sure you didn't break or drop any of the tests?

A. I am reasonably confident I did not. Secondary review appreciated. Some of the touched files are quite old and obviously "quick hack" scripts. By rewriting them quite a bit I was able to simplify and perform some of the tests in different ways or parts of the script. As far as I know I didn't drop any.

Q. Why didn't you use newer features in the rewrite?

A. I am a bit conservative in my taste, and I like build tools to be able to run on older perls, and for things like this I prefer to stick with what I know well. Patches welcome.

Q. Why didn't you move more of the stuff we shouldn't bundle with our releases?

A. I figured someone like Nicolas R. (who helped motivate this patch) would feel left out if I didn't leave him anything to do. :-)

demerphq avatar Mar 10 '22 09:03 demerphq

I don't think this is the right solution. We should be reusing out existing MANIFEST.SKIP infrastructure and conventions instead of reinventing a wheel.

Leont avatar Mar 10 '22 13:03 Leont

I don't think this is the right solution. We should be reusing out existing MANIFEST.SKIP infrastructure and conventions instead of reinventing a wheel.

Essentially what we should do is just

use ExtUtils::Manifest;
my $skip = maniskip;
chomp(my @git-files = `git ls-files`);
my @files = grep !$skip->($_), @git-files;

Leont avatar Mar 10 '22 13:03 Leont

On Thu, 10 Mar 2022 at 14:37, Leon Timmermans @.***> wrote:

I don't think this is the right solution. We should be reusing out existing MANIFEST.SKIP infrastructure and conventions instead of reinventing a wheel.

Essentially what we should do is just

my $skip = maniskip; chomp(my @git-files = git ls-files); my @files = grep $skip->($_), @git-files;

We dont use any of the standard manifest infra for our MANIFEST files.

But I'll look into it and consider. I personally would prefer a proper MANIFEST file for a number of reasons including a proper description of each file, and because I have seen MANIFEST skip files turn into a horrible mess.

But for now I'd rather not debate it with you. When the PR goes out of "work in progress" I will be happy to discuss more.

cheers, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq avatar Mar 10 '22 14:03 demerphq

I personally would prefer a proper MANIFEST file for a number of reasons including a proper description of each file

I do agree with that

Leont avatar Mar 10 '22 14:03 Leont

Ok, I have removed the "Work In Progress" flag, and I am ready to debate this now. Thank you for your patience. :-)

Please review the PR message (same as the commit message), it explains why I don't think that this should be a MANIFEST.skip functionality. The strongest argument IMO is A1, which I will paraphrase here:

Quite a bit of our build tooling expects to be able to read the MANIFEST file and then do stuff based on the contents. MANIFEST.skip is designed to exclude things from the creation or updates of a MANIFEST, usually via "make manifest" in a normal CPAN dist (which our Makefile does not support!). But we want to use these lists of files to do further processing, so a list of regexes and file globs doesn't really cut it. Especially in terms of ease of conversion to use the new files having two manifest like files is much easier to deal with, all the places we load the current MANIFEST can be changed to load both and operate on the union. Doing the same thing with skip file would IMO take a lot more effort for minimal benefit. In other words while it might seem like using maniskip() type functionality is suited to the task, IMO it really isn't.

There are other IMO good reasons this should not "just" reuse the maniskip functionality, and I have listed them all in the commit message, but the fact that we want to operate and process the list of files is the strongest.

demerphq avatar Mar 10 '22 17:03 demerphq

This does a bunch of tooling changes, a bunch of changes to what files are shipped in a tarball, a few additions to the descriptions of files in the MANIFEST and updating a piece of documentation. Those should at the very least be split into 4 separate commits (and possibly the tooling changes could be split further)

Leont avatar Mar 10 '22 18:03 Leont

Various parts of our build and test process expect to read the MANIFEST file and then do things based on the entries contained within.

What are these things, that might be adversely affected by listing some Porting/* files in MANIFEST.SKIP?

I think that MANIFEST.skip style files of exclusion regexens and globs are error prone and easy to mess up, for instance by excluding far more than you had intended to

MANIFEST.SKIP can also handle literal entries, without wildcards, if it is more convenient to list every file out longhand.

karenetheridge avatar Mar 10 '22 19:03 karenetheridge

What are these things, that might be adversely affected by listing some Porting/* files in MANIFEST.SKIP?

As far as I can tell that's only t/porting/exec-bit.t, except it's wrong. Porting/exec-bit.txt is what we use to set +x bits when building a release, so files that aren't part of a release shouldn't be listed in it. So it doesn't need the skip-list after all.

Leont avatar Mar 10 '22 19:03 Leont

Q. Why isn't this using MANIFEST.skip style functionality?

A1. Various parts of our build and test process expect to read the MANIFEST file and then do things based on the entries contained within. Eg, run tests, or extract data, or compare the file list to content in another file. Those parts of our build process would break if we used a skip style list of regexen. So it would be more work to teach them to deal with such a file, assuming it was actually doable - given the additional work I have not considered it deeply. On the other hand teaching that logic to simply read two files was and is easy.

That seems to assume that using MANIFEST.SKIP means not using MANIFEST. I don't think anyone has suggested doing away with the latter. We would use MANIFEST.SKIP for two reasons: regenerating the MANIFEST, and testing it for correctness. Anything else would use MANIFEST as it does now.

A2. I think each file we have in the repo should have a description. This patch currently doesn't provide a description for each, but it does for many, especially those migrated from MANIFEST.

Descriptions can be useful, but I really don't see the value of «.gitignore rules for the Digest-SHA cpan distribution» for dozens of cpan distributions.

A3. I think that MANIFEST.skip style files of exclusion regexens and globs are error prone and easy to mess up, for instance by excluding far more than you had intended to. They can also be annoying to get right, obviously not impossible, but sometimes annoying. Explicitly listing everything is easy in every way, especially to mechanize.

We have tests. We are checking for correctness. I don't see how this would suddenly become error-prone. Again, this seems to assume that there wouldn't be a MANIFEST file, which is not how MANIFEST.SKIP is typically used.

A4. I would like to be able to move verbatim entries from our existing MANIFEST into the new Porting/MANIFEST.dev, description and all. MANIFEST.skip style files do not support descriptions except as comments as far as I recall. That would have meant munging the data from MANIFEST during the move process which would be annoying.

As far as I know, MANIFEST.SKIP does support descriptions (though it isn't well-documented).

Leont avatar Mar 10 '22 19:03 Leont

t/porting/readme.t and t/porting/readme.t. Also some of our test logic uses MANIFEST to get the list of tests to run. So far we havent moved any t files into Porting/MANIFEST.dev but it seems reasonable that we will. I expect @atoomic will be pushing some patches on top of this branch to do things like that.

You are welcome to migrate this all to use MANIFEST.skip instead, I don't think its worth the effort.

demerphq avatar Mar 11 '22 04:03 demerphq

I found another 7 places where we use the manifest file to drive testing or other processing. I really don't get how you propose to make this work with MANIFEST and MANIFEST.SKIP. The idea of MANIFEST.SKIP is to designate files that should NOT be included in the manifest. So for the places where we want to have a way to get a list of the things we would have to abuse the MANIFEST.SKIP functionality to do so, or rewrite code to walk the disk or something, which brings additional side-issues with it. So likely we would have to read two file formats. The work I just did was mostly mechanical, to make the code read two files and not one.

Nothing i said implied that using MANIFEST.SKIP would NOT involve using MANIFEST, it implies that MANIFEST would be missing files, and thus in the places where we want to read a manifest of ALL the files, including those not part of the final tarball, that we would have to read that list from two totally differently formatted files. Your way would involve more complex processing, and likely more complex changes. I didn't want to get into analyzing whether choices made by previous devs in those files was sensible, for instance your point about the execute-bit test, somebody decided it was a good idea to have files in Porting check that the files were listed there. I don't know if that was a good decision or not, ill leave that to someone else.

Regarding your comment about descriptions for .gitignore, sure, that doesn't add that much value, I just did that for completeness and to establish the trend that things should be described. And in one of the comments I included some information about how to get documentation on .gitignore, which I think is useful, likewise on .gitattributes, and I described .git_patch, and the other files in the list do have useful descriptions. So I don't really think your point about .gitignore is very helpful to the discussion. You seem to think this discussion is about having a way to specify the things we ignore in certain places. IMO that is not the point, the point to me is how do we have our cake, a fully documented MANIFEST like database of our files, and still be able to distinguish between files that should be in the tarball and final release and those that shouldn't, and how and where do we authoritatively document and list the files. The fact that previously we listed almost everything in MANIFEST and then in a few places where tested that MANIFEST listed everything we filtered out a few special cases isn't in my eyes representative of the core problem I want to solve here, if it was then I admit MANIFEST.SKIP would be the way to go. In fact I could easily imagine my plan coupling with the use of a MANIFEST.SKIP file as well, for the few places where we want to look at the union of all our files and then exclude some files based on regex. There are two such examples in the code I just pushed. But that wouldn't make MANIFEST.dev go away, it would add a third data file. Personally i don't think there is evidence /right now/ that we should use such a complex a mechanism, the two cases are very specific. One of them might even totally redundant.

Maybe your proposal would "perfect", but IMO this is a good case of perfect is the enemy of good enough, this is good enough, it works, and it passes test, it is conceptually simple, and it scratches the itch that was expressed. Also, my plan has one big advantage over your plane, mine is already done. Maybe you want to put together a patch that does it your way to prove your point, but I don't think your plan is as good an idea as you do so I won't be doing it that way myself.

demerphq avatar Mar 11 '22 06:03 demerphq

Also, my plan has one big advantage over your plane, mine is already done.

I can't overstate how annoyed I am at this statement, even if I was expecting it.

Not just because ignores the obvious and entirely valid third option of leaving things the way they are, but primarily because it is a rather hostile way of cooperation. You start with «But for now I'd rather not debate it with you. When the PR goes out of "work in progress" I will be happy to discuss more.», and then the next day you come up with this "I already wrote it so we might as well merge it".

Frankly, if you can't even pretend to be interested in actually listening to what other people are saying I'm not going to waste my time and energy on this discussion and would rather suggest for this ticket to be closed.

Maybe you want to put together a patch that does it your way to prove your point, but I don't think your plan is as good an idea as you do so I won't be doing it that way myself.

Sure: #19523

Leont avatar Mar 11 '22 13:03 Leont

Also, if you're going to make fundamental changes to how tarballs are built, you probably want to double check if that tarball works. Right now it fails to build a tarball, but even if it didn't I am expecting it to fail some of those porting tests because of missing files.

Leont avatar Mar 11 '22 14:03 Leont

On Fri, 11 Mar 2022 at 15:12, Leon Timmermans @.***> wrote:

Also, if you're going to make fundamental changes to how tarballs are built, you probably want to double check if that tarball works. Right now it fails to build a tarball, but even if it didn't I am expecting it to fail some of those porting tests because of missing files.

Yeah, well I wasnt expecting to merge it until we moved the porting tests into the MANIFEST.dev, I specifically stated I was expecting atoomic to add patches.

Yves

demerphq avatar Mar 12 '22 01:03 demerphq

On Fri, 11 Mar 2022 at 14:37, Leon Timmermans @.***> wrote:

Also, my plan has one big advantage over your plane, mine is already done.

I can't overstate how annoyed I am at this statement, even if I was expecting it.

Not just because ignores the obvious and entirely valid third option of leaving things the way they are,

Well Nicolas R made a request I thought was reasonable, and so did Andy. So it seemed worth the time to do. I don't regret my investment on it at all. Even if it doesn't get merged as is. The patches to manisort are an improvement, and a few other parts are too. I'd also argue that whatever we do a nice chunk of the change to manifest.t should stay as well. The rest was mostly mechanical.

but primarily because it is a rather hostile way of cooperation. You start with «But for now I'd rather not debate it with you. When the PR goes out of "work in progress" I will be happy to discuss more.»,

Well, I was pretty annoyed at you actually, as you just dismissed the idea out of hand without demonstrating that you understood the wider ramifications of what you were suggesting. It felt pretty superficial and at least a little bit disrespectful.

and then the next day you come up with this "I already wrote it so we might as well merge it".

I never said that. Not once did I say we should merge it as it stands. I said it had the advantage that it was done (in terms of the infra, not the final adjustments to the MANIFEST.dev), and said if you want to do your plan YOU do the work. That isn't the same thing /at all/. My comment was inspired by the fact that I think it will be a lot more work to do it your way, and have significant downsides and I wasn't interested in doing a lot more work to implement what I think is an inferior plan. I tried to address all the reasons I think it will have downside, but you didnt really engage with them and instead zeroed in on things like having descriptions for .gitignore not adding value. Again it felt kinda disrespectful.

Frankly, if you can't even pretend to be interested in actually listening to what other people are saying I'm not going to waste my time and energy on this discussion and would rather suggest for this ticket to be closed.

I did listen to you. I just don't agree with you. And I think you just ignored some of my valid arguments, so I don't really think your outrage is justified. As far as I know I engaged with pretty much every point you raised, or at least I really did try to.

Maybe you want to put together a patch that does it your way to prove your point, but I don't think your plan is as good an idea as you do so I won't be doing it that way myself.

Sure: #19523 https://github.com/Perl/perl5/pull/19523

Looking forward to reviewing it. If it accomplishes the goal and the community thinks it's a better plan I am fine with extracting the reusable parts of my patch sequence and abandoning the PR. I really don't mind either way.

cheers, Yves

demerphq avatar Mar 12 '22 02:03 demerphq

Note: https://github.com/Perl/perl5/pull/19542 provides one suggestion to merge both ideas from https://github.com/Perl/perl5/pull/19513 and https://github.com/Perl/perl5/pull/19523

atoomic avatar Mar 17 '22 17:03 atoomic