perl5
perl5 copied to clipboard
DRAFT: Replace checkAUTHORS.pl with updateAUTHORS.pl/updateAUTHORS.pm, allow users to be excluded
In #20010 a user has asked that we do not list their name or email any of our latest files in the repo.
This was a bit problematic as updateAUTHORS.pl would just add the user back if we did so. I added functionality to updateAUTHORS.pl to support us using hashed contributor details as a way to prevent users from being added by our automation. Unfortunately this then meant that checkAUTHORS.pl and updateAUTHORS.pl had different idea about what should be in the files. There are other issues, the checkAUTHORS.pl file contains a lot of data in a hard to reuse or manage form, and in plaintext, (although slightly munged so its not a direct scraper target). In the end it made a lot more sense to me to simply drop checkAUTHORS.pl and replace it with updateAUTHORS.pl for all use cases.
This patch sequence adds support for a bunch of features that checkAUTHORS.pl had that updateAUTHOR.pl did not, almost uniformly related to reporting so that the tool is now nearly feature equivalent with checkAUTHORS.pl.
There are a few differences:
- The old tool supported saving the commit history to a file and then piping into the checkAUTHORS.pl, this mode is no longer supported, although if we need it we can bring it back, but with a certain amount of bodge. The reason for the bodge is that updateAUTHORS.pm expects its log data in a very particular format that is not supported out of the box by git log. You have to construct a long and awkward format string to provide for log to use, so it is not quite so simple to pipe the results to a file. If we really need it we can figure out a solution. An upside of this is that the tool doesn't expect filenames as arguments, instead it expects a get ref spec just like git log would do so it shouldnt really be needed to pipe the results.
- The old tool supported querying github. updateAUTHORS.pl doesnt support this as far as I remember. I dont think we were using it anyway. We can teach updateAUTHORS.pl to do the same thing if someone remembers why were doing this.
- The old tools supported a couple of options that IMO no longer make sense. Specifically the "--missing" option is redundant. --validate/--tap does the same thing.
- The old tool had some (out of date) information on committers which I have abandonded. We currently do not track this. I was thinking if we want to track it we should add a * to peoples AUTHOR data and then add a footnote explaining they are committers. Then we can use the AUTHORS file as the primary source of this data.
This PR adds two new files
- Porting/updateAUTHORS.pm which is the brains of the updateAUTHORS.pl tool separated from the command line part of the two so that the POD for using the tool could be separated from the pod of how the tools works.
- Porting/exclude_contrib.txt. This contains a list of SHA-256 in base64 of the email details of contributors (commit author/email) that we should not include in .mailmap or AUTHORS.
This PR removes two files:
- Porting/checkAUTHORS.pl - replaced by Porting/updateAUTHORS.pl
- t/porting/pending-author.t - I couldnt figure out what this was for, and it doesnt seem to be required with the updateAUTHORS.pl workflow. If someone can explain what it was used for better than the test file itself then I would have no problem adding it back.
This PR removes the personal details as requested in #20010
The latest version of updateAUTHORS.pl has support for additional reports, and I expect as I have time I will add more. IMO it is much easier to do this with updateAUTHORS.pl than it was with checkAUTHORS.pl (because of the reliance on the git standard log formats). I plan to add a "who-works-on-what" to show file data by author. Eg, it might show that @khwilliamson, myself and @hvds do most of the changes to the regex engine.
I have marked this PR as draft for now, as i expect to make changes as I get feedback, but the patch it pretty stable, review appreciated!
t/porting/pending-author.t - I couldnt figure out what this was for, and it doesnt seem to be required with the updateAUTHORS.pl workflow. If someone can explain what it was used for better than the test file itself then I would have no problem adding it back.
If there are pending changes, it checks if the current author is listed as author. This is to prevent the test-suite from succeeding before making one's first commit but failing after it.
On Mon, 8 Aug 2022 at 23:10, Leon Timmermans @.***> wrote:
t/porting/pending-author.t - I couldnt figure out what this was for, and it doesnt seem to be required with the updateAUTHORS.pl workflow. If someone can explain what it was used for better than the test file itself then I would have no problem adding it back.
If there are pending changes, it checks if the current author is listed as author.
What is a pending change?
Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
On Tue, 9 Aug 2022 at 09:09, demerphq @.***> wrote:
On Mon, 8 Aug 2022 at 23:10, Leon Timmermans @.***> wrote:
t/porting/pending-author.t - I couldnt figure out what this was for, and it doesnt seem to be required with the updateAUTHORS.pl workflow. If someone can explain what it was used for better than the test file itself then I would have no problem adding it back.
If there are pending changes, it checks if the current author is listed as author.
What is a pending change?
To expand on my failure to understand this concept, authors.t checks that the AUTHORS file is up to date. That means it checks that if you ran updateAUTHORS.pl nothing would change given the commits that are in the repo when it runs.
When pending-author.t runs what does it check?
Our tests run in two scenarios relevant to this discussion, on peoples local box while developing and on CI when the code is pushed.
When this executes on my local box what is it testing that authors.t doesnt test? When this runs on CI when the code has been pushed what is it testing that authors.t doesnt test?
I cannot work out a reasonable understanding of what a "pending change" is in this context. Either the commit is in the commit log and it is being tested or it is not. If it is then it gets tested by authors.t, if the commit isn't in the commit log then it isn't pending it is non-existent as far our testing goes.
cheers, Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
What is a pending change?
Uncommited changes
@Leont - Ok, when I first read pending-author i think i misunderstood what it does and got myself in a muddle, the term "pending" really threw me. I should have just ignored that and looked at the code more closely.
So it is looking to see if there are uncommitted changes that if they were committed with the current users details would result in a test failure. I dont see why we need another test file for this, we can just put it into the authors.t testing. I will add that.
Ok, authors.t now does the equivalent of what pending authors used to do, arguably better as it now checks (nearly) all the things that git would. I also added copious verbiage if the tests fail.
Some early feedback without really looking at the code yet:
- commit
Porting/updateAUTHORS.pl - add support for excluding contributors:- based on the commit description I got a feeling that the commit tries to do too much at once (i.e. consider splitting the commit?)
Porting/checkAUTHORS.pl which does not do the right thing these days anyway.: Is it possible/sensible to include a small description of how/when it doesn't do the right thing?
- commit
remove redundant data from .mailmap:- can you describe in the commit message what you consider redundant data?
- and the reason why this matters? [I'm not disputing that there - probably - is a good reason for it; but I like seeing it in the commit message]
On Thu, 11 Aug 2022 at 12:56, Bram @.***> wrote:
Some early feedback without really looking at the code yet:
commit Porting/updateAUTHORS.pl - add support for excluding contributors:
based on the commit description I got a feeling that the commit tries to do too much at once (i.e. consider splitting the commit?)
Ok.
Porting/checkAUTHORS.pl which does not do the right thing these days anyway.: Is it possible/sensible to include a small description of how/when it doesn't do the right thing?
Sure. I think it was that it doesn't understand the new Porting/exclude_contrib.txt but ill update the commit.
commit remove redundant data from .mailmap:
can you describe in the commit message what you consider redundant data?
When the LHS and RHS of a .mailmap line are the same.
and the reason why this matters? [I'm not disputing that there - probably - is a good reason for it; but I like seeing it in the commit message]
Actually this is a controversial point. Some folks seem to prefer us not having it in the .mailmap. It also makes the mailmap larger.
I personally dont think it matters and think that removing the entries results in us losing capabilities, specifically the ability to recognize when someone has manually updated the AUTHORS file to change their preferred email or their name. I only implemented it because it came up multiple times and I got tired with discussing it. To compensate I added options to change name via the command line.
Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
BTW, I am still not entirely satisfied with a couple of aspects of this PR, it might be wise to wait until I say so before anyone wastes their time reviewing code I am going to kill.
Hi @bram-perl I have update the code and include I think all of your feedback. With the amount that I have rebased, squashed, and otherwise munged the sequence I am going to mark all your previous feedback as resolved (as I hope it is) and start fresh. The patch sequence should be much cleaner now with much of the interim stuff squashed out. Please let me know what you find that I overlooked. I have been staring at it so long I have become desensitized. :-)
That last push just removes a function made obsolete by the checkAUTHORS removal.
Hi @bram-perl I have incorporated most of your feedback, either literally as you suggested, or in some case in spirit but i went a different way.
The only points I did not go along with were the following two:
- extensive docs in updateAUTHORS.pm. First, updateAUTHORS.pm is the "brain" of "updateAUTHORS.pl", and many things are documented in the .pl script that apply to the .pm file. So I added a paragraph saying that people should be familiar with both if they want to hack on this, but to put it simply I don't think the docs for the internals of a Porting/ script need to be at CPAN quality levels, and definitely that they aren't shouldn't be a blocker to move this ball forward now. This code isn't intended for reuse (outside of the Porting directory anyway) and its already better documented and structured than what it replaces, likely I will be one of the few who work on it, and frankly I'd like to put it to bed for now for a while. Patches welcome of course if you think otherwise, and I might even do a follow up PR with more docs when I have more energy, but right now I am growing exhausted with this PR and I would like to get it merged and rest up on it a bit before I do anything else. Doc patches are sometimes best written by the person who didnt write the code anyway, as the have a better eye for what is obvious or not.
- The control logic for authors.t, I think we should try to avoid commit range scanning in our testing as I believe it is fragile and likely to break. I am open to the idea that we need to do something special for TRAVIS but i would like to see what breaks before I say what it is. My feeling is right now are doing it wrong, and we should consider an alternative strategy, but i need data about how the test fails without the TRAVIS special case.
So if you don't mind I will let you push a follow PR for point 1. For point 2, I am open to doing something, but I would really like to know how this fails when that travis logic fires, which is simplest done by merging, waiting for it to fail (if it does) and then looking at the failure and then dealing with it. If folks are ok with that I would like to proceed that way.
-
The docs: ok,you did document some of it so I wasn't sure how extensive you wanted the docs so I just added a comment when I spotted something. Not wanting to write/have extensive docs for it is acceptable (at least to me it is).
-
As suggested earlier: I would just take it out of this PR and submit it as a separate PR. (I think the error in the revision logic is already know, and that PR #20113 is the fix for it but the reasons why it's needed - if it's needed at all - should be better documented)
I'll try todo a final review tonight on this PR;
Re: POD docs for the module. I would /like/, long run, to have complete docs for this, but I don't see those docs as a necessary precondition to getting the PR merged. Rather I would like to get this merged and then incrementally improve it over time, ideally with the support from others who care about these things. IMO after the PR (very much with your help) updateAUTHORS.pl will be a better tool than we had with either itself or checkAUTHORS.pl before this PR. We can incrementally make it even better as we like once its merged.
BTW, I am deliberately pushing commits that fail the authors test right now, i will force push the fix for it aftewards
The subject of this pull request indicates it's a "DRAFT". However, it's not categorized as a draft on github. Could that be clarified?
@bram-perl fwiw i can see some issues where the commit messages are a little out of sync with the contents. ill fix that today and repush.
@bram-perl fwiw i can see some issues where the commit messages are a little out of sync with the contents. ill fix that today and repush.
ok, just let me know when it's done and I'll review
@jkeenan I think I will remove the "DRAFT" relatively soon, its pretty close to being ready for merge now IMO, so not sure there is any point in getting github to think it is one.
@bram-perl I think I fixed all the issues I noticed when I reviewed. Please go ahead, ill avoid any force pushes until i hear from you. Thanks for your time!
It's looking good, there are some minor changes I would still do (moving things from one commit to another); but rather then burden you with them I did these myself using fixup commits[^1].
Applying them is slightly trickier but following steps should work assuming:
- your current tree is clean (no uncommited changes, no untracked files, ..)
- your first commit is e6d0e3e
- your last commit is 06460a3
In your perl5 repo: (i.e. origin=github.com/perl/perl5)
git checkout yves/exclude_contrib
git remote add bram https://github.com/bram-perl/perl5
git fetch bram
git rebase -i 7aa33fb
in the editor put:
pick e6d0e3e testsuite.yml - fix and rename "authors" CI workflow to "authors_involved_debug"
pick 673d0f5 .mailmap - add missing entries using updateAUTHORS.pl
pick f1ebb11 updateAUTHORS.pl - split into module Porting/updateAUTHORS.pm
pick 3175823 fixup! updateAUTHORS.pl - split into module Porting/updateAUTHORS.pm
pick 6ac1a3b fixup! updateAUTHORS.pl - split into module Porting/updateAUTHORS.pm
pick bdb8db5 fixup! updateAUTHORS.pl - split into module Porting/updateAUTHORS.pm
pick fa0934c updateAUTHORS.pm - use object to store state and avoid passing args
pick 016ab00 fixup! updateAUTHORS.pm - use object to store state and avoid passing args
pick d3b46b6 fixup! updateAUTHORS.pm - use object to store state and avoid passing args
pick f734e7f updateAUTHORS.pl - use Getopt::Long in "hash ref" mode
pick 4617c69 updateAUTHORS.pl - add support for verbose
pick d131c5c updateAUTHORS.pm - don't update files if they don't change
pick 8521c85 fixup! updateAUTHORS.pm - don't update files if they don't change
pick c6a70a9 fixup! updateAUTHORS.pm - don't update files if they don't change
pick f07e026 fixup! updateAUTHORS.pm - don't update files if they don't change
pick 4187e3b updateAUTHORS.pl - add support for a commit range
pick 09834f4 fixup! updateAUTHORS.pl - add support for a commit range
pick 6e098ff updateAUTHORS.pm - use _file suffix for methods that read/update files
pick 3a21d07 updateAUTHORS.p[lm] - add exclusion support
pick 7761494 fixup! updateAUTHORS.p[lm] - add exclusion support
pick 5691a27 fixup! updateAUTHORS.p[lm] - add exclusion support
pick 615cd2c fixup! updateAUTHORS.p[lm] - add exclusion support
pick f35118a updateAUTHORS.p[lm] - add support for reports like checkAUTHORS.pl has
pick a4fcfb7 fixup! updateAUTHORS.p[lm] - add support for reports like checkAUTHORS.pl has
pick 21d6fb6 fixup! updateAUTHORS.p[lm] - add support for reports like checkAUTHORS.pl has
pick 19ebf33 updateAUTHORS.pm - Fix windows support for git operations
pick 0751ba3 updateAUTHORS.pm - use fc() for sorting, and improved subs to do so
pick 6d82b90 updateAUTHORS.pl - Add debugging option
pick 62db29e AUTHORS - update the blurb
pick 9f5c7f9 updateAUTHORS.pl - add support for --tap
pick cb19cf0 updateAUTHORS.pl - improved pod explanations
pick eec890a updateAUTHORS.pm - remove redundant data from .mailmap
pick 9fd8bda fixup! updateAUTHORS.pm - remove redundant data from .mailmap
pick 3d29923 updateAUTHORS.p[lm] - Add a way to rename authors properly
pick 6659c97 checkAUTHORS.pl - delete and replace with updateAUTHORS.pl
pick e6ca45b AUTHORS, .mailmap - resolve issue #20010
pick dbb255d *FIXUP* reports - make sure we turn on numstat when required
pick 5817f37 *FIXUP* for reports patch
pick 3cf49b8 *FIXUP* - remove commented out debug code (redundant mailmap)
pick 1fe51f2 updateAUTHORS.p[lm] - more generic update skipping
pick 87befb8 update_authors.t - test updateAUTHORS.pl
pick 06460a3 updateAUTHORS.p[lm] - filter out "unknown" authors
close the editor, the rebase will/should give 5 conflicts, fix them using:
git checkout --theirs . && git add . && git rebase --continue
(don't worry about the conflicts, there are fixup to fix it up)
And when that is all done the final step is:
git rebase -i 7aa33fb --autosquash
[^1]: some of the fixup are duplicated to fix things (again) after a the git checkout --theirs
one thing to consider (that I did not change):
updateAUTHORS.pl in it's final version contains:
sub main {
....
my $changed= $self->read_and_update();
...
return $changed; # 0 means nothing changed
}
exit(main()) unless caller;
Meaning when it did make changes the exit code will be 1. That seems a bit strange.
Thanks for the feedback and commits. Ill review them and the instructions you left and get them merged up. Regarding the exit code my thought was that things like this:
$ Porting/updateAUTHORS.pl --no-update || echo "Files need updating"
$ Porting/updateAUTHORS.pl || echo "Files were changed"
might be useful behavior. It should be documented tho. Ill tweak that when i integrate your fixups.
BTW the test fails are because I didnt realize the tests were run under a shallow clone. Not sure what I can do about that except skip them in CI or change the CI workflow to use a full clone.
Thanks for the feedback and commits. Ill review them and the instructions you left and get them merged up.
If you should run into troubles let me know (or come find me on IRC)
Regarding the exit code my thought was that things like this:
$ Porting/updateAUTHORS.pl --no-update || echo "Files need updating" $ Porting/updateAUTHORS.pl || echo "Files were changed"might be useful behavior. It should be documented tho. Ill tweak that when i integrate your fixups.
If it's intended behavior then it's OK.
(The || echo "Files were changed" is a bit simplistic tho, it's really: || echo "Files were changed or something went wrong")
BTW the test fails are because I didnt realize the tests were run under a shallow clone. Not sure what I can do about that except skip them in CI or change the CI workflow to use a full clone.
The 'sanity check' in the CI runs with a full clone so it should be able to do the tests there. Which would mean the other builds should be able to skip it since the sanity check already verified it. Can the test detect it's run with a shallow clone? (and then skip the tests?)
I've also been thinking about CI/authors test in general but that's more a discussion to have in #19867 and not in this PR
On Sat, 20 Aug 2022, 11:37 Bram, @.***> wrote:
Thanks for the feedback and commits. Ill review them and the instructions you left and get them merged up.
If you should run into troubles let me know (or come find me on IRC)
Regarding the exit code my thought was that things like this:
$ Porting/updateAUTHORS.pl --no-update || echo "Files need updating" $ Porting/updateAUTHORS.pl || echo "Files were changed"
might be useful behavior. It should be documented tho. Ill tweak that when i integrate your fixups.
If it's intended behavior then it's OK. (The || echo "Files were changed" is a bit simplistic tho, it's really: || echo "Files were changed or something went wrong")
True. We can use a specific exit code to be able to disambiguate.
Yves
A shallow clone can be detected by checking: git rev-parse --is-shallow-repository or by checking if the file .git/shallow exists.
Note that the git rev-parse needs a recent enough version of git;
With a recent version:
$ git rev-parse --is-shallow-repository
false
With an old version:
$ git rev-parse --is-shallow-repository
--is-shallow-repository
What Porting/cmpVERSION.pl appears to check are the git tags:
unless (length $tag_to_compare) {
die "$0: Git found, but no Git tags found\n"
unless $tap;
print "1..0 # SKIP: Git found, but no Git tags found\n";
exit 0;
}
An alternative (that is also in Porting/cmpVERSION.pl) is to check if the perl-1 commit exists:
if (-f '.git') {
# the hash of the initial commit in perl.git (perl-1.0)
my $commit = '8d063cd8450e59ea1c611a2f4f5a21059a2804f1';
my $out = `git rev-parse --verify --quiet '$commit^{commit}'`;
chomp $out;
if($out eq $commit) {
++$found;
}
}
A shallow clone can be detected by checking
Just a heads up: I'm working on a PR for this (#20124) because CI runs for PRs are currently red (since commit 16dd3f70cc16005d5af7146385733a7c945fb67e was merged on blead).
Hey @bram-perl I applied your changes, fixup-merged some of my own, and perltidied the sequence. I have two minor additional changes I want to merge that I will do right now (in the next 30 minutes), but i think then it will be good to go.
Re: #20124 thanks for picking that up. I guess my bad on merging that. :-(