PPI icon indicating copy to clipboard operation
PPI copied to clipboard

RFC: adaptations of development process to allow quicker releases

Open wchristian opened this issue 9 years ago • 12 comments

In the pursuit of getting more releases out, i've done some work on the repo today and reworked master. My plan is to have master, in the future, consist, release-over-release, of 3 phases:

  1. all the uncontroversial commits to the front, i.e. typo fixes, doc amendment, plumbing work, etc.
  2. sets of tests only, if necessary extracted out of pull request commits, with failing tests marked TODO
  3. actual fix commits, which update the Changes file and unmark TODO tests as appropiate

This will help us identify into which category any given test result after application of a set of fixes falls, and what their ramifications on the next release are. These categories are:

  • tests that pass, and previously passed, we want to keep those as they are
  • tests that pass, and previously failed in TODO, that is the "payload" of a release
  • tests that fail, and previously failed in TODO, these are just maintenance of status quo; we can release with those
  • tests that fail, and previously passed, these indicate a fix breaking something, a regression; we can't release with those

To make this kind of analysis as useful as possible, we need to front-load adding tests as much as possible, especially when they mark already existing bugs; since that helps identify bugs we can disregard and in whose presence we can release safely.

@moregan Any thoughts on this? You can see a sketch of master with phase 1+2 in the repo.

wchristian avatar Aug 19 '15 13:08 wchristian

Should 2 say "Set of tests only" ?

Otherwise I'm not sure what the difference between 2 and 3 is.

wolfsage avatar Aug 19 '15 13:08 wolfsage

@wolfsage Correct, fixed it, thanks. :)

wchristian avatar Aug 19 '15 13:08 wchristian

I guess that sounds good to me if you're willing to do the work to manage the releases this way. It seems a little heavy to me though. It seems like the same end goal could be achieved by:

  • Getting master to a point where there are no more passing TODO tests
  • Merging in a pull req
    • If a TODO test starts unexpectedly passing - great!
    • If a normal test starts failing - uhoh!

Doing it this way seems to provide the same benefits.

wolfsage avatar Aug 19 '15 14:08 wolfsage

@wolfsage: The problem with that is that we have a lot of different commits queued up, and it's hard to tell which failures thrown up by them are long-standing bugs, and which ones are one fix breaking something a previous commit fixed. Front-loading the tests removes ambiguity there, and makes it possible to quickly pull in fixes known to work fine, make a release and postpone problematic fixes for later.

My goal here is to try and get as much good out in as little time as possible, and right now PPI has a stupenduous amount of fixes waiting in the wings.

Once we have a lot more coverage and less waiting commits it's completely uncontentious to relax this again.

Does this make sense to you or am i deluding myself?

wchristian avatar Aug 19 '15 14:08 wchristian

I'm not totally sure I follow, but if you see a benefit I say try it and see :)

wolfsage avatar Aug 19 '15 14:08 wolfsage

@wolfsage: If you see no obvious problems then that is good for me. :)

Also, maybe this will make it a little more clear: Test failures can also happen through interaction of multiple fixes (that on their own are ok) being combined. These are slowing us down because they usually need to be fixed by way of reworking the guts.

This procedure is meant to make it easy to find when that happens and to make it distinct from fixes breaking tests on their own (pure regressions).

wchristian avatar Aug 19 '15 14:08 wchristian

If this will help you and we'll get more releases out, I'm willing to go with it. (I would like master to (continue to) always pass all tests.)

I see the branches 'uncontroversial' and 'new_tests' in the repo, but am unclear on what my workflow would be in detail. I will need some help. E.g.: do these "test sets" go into master or into new_tests; In what areas will I need to bring my pitiable git skills up to snuff; etc.

I don't quite see how TODO-foo with tests discovers issues any better than requiring a branch to be current with master and passing tests, but that will surely become clear.

moregan avatar Aug 20 '15 05:08 moregan

@moregan:

Two things you can do:

  1. If you see any commits that are uncontroversial and which i missed, integrate them into that branch.
  2. If you feel like putting some time in, select (and probably let me know so we don't duplicate work) a test file in the commits after the tag of the last release, use git log file to figure out which commits have added tests to it, then cherry-pick those onto new_tests (while removing in conflict resolution changes to files other than the test script), squash these commits together with an interactive rebase (and in doing so remove remaining changes to other files), run the test script, and mark anything failing as TODO (or if you don't see an easy way to do so, poke me to do it).

I don't quite see how TODO-foo with tests discovers issues any better than requiring a branch to be current with master and passing tests, but that will surely become clear.

I'm not sure how to explain it well. Let me try again. :)

tests0+fix0 -> issue 1 found -> tests1+fix1 -> issue 2 found -> tests2+fix2 -> issue 3 found -> tests3+fix3

And this keeps happening and the issues keep getting more complicated and more difficult to fix. We try to not release with known issues, but since we keep finding new ones and they keep getting more complicated, this means we cannot realistically release at all if we don't accept that we must release with known issues.

Now the question becomes figuring out which issues we can release with and which ones we can not release with.

Look at that chain above again and consider the hypothetical question of: Did issue 3 exist before fix0? Or was it created by fix0, fix1 or fix2?

This question is important because if an issue existed before fix0, then we can safely ignore it and release without needing to fix it immediately, since such a release would not mean releasing with a new bug, only with an old one that's always been in place.

Now consider what happens if we rebase that chain and split it up into this structure:

tests0+tests1+tests2+tests3+todos -> fix0+untodo -> fix1+untodo -> fix2+untodo

If, while applying fix0-2, additional test failures appear, then we know that that fix introduced new bugs and needs to be postponed along with fix3 that would then have been necessary to fix those regressions; if however no additional test failures appear, then we know that fix0-2 are safe and fix3 actually fixed a long-standing bug.

wchristian avatar Aug 20 '15 12:08 wchristian

Here is the logic I've always applied for issues.

  1. Parsing Perl is provable impossible, and we can never make a parser without issues. This is why we implement round tripping so fanatically.
  2. The implementation of the parser is thus fractally complex, there's always going to be smaller and smaller issues we don't handle.
  3. Issues that are common in real world code matter. Issues that never occur in real world code don't matter. Issues that occur more in real world code matter more.
  4. Issues that occur in real world code become bug reports, which become test cases, which ensure those issues remain addressed. Issues that occur rarely or don't matter much don't get noticed or reported, and so don't get monitored for regressions.
  5. Thus, if there is an issue or potential issue that has still managed to be excluded from the test cases and samples and regressions after so many years, then they don't matter enough to prevent us moving ahead and fixing oasis that people do notice.
  6. If needed, oasis that are raised in bus that are hard to fix and hypothetical or extraordinarily rate can be considered WONTFIX if it places a suitably large burden on the test of the codebase.
  7. In addition to our tests, I also consider the test suites of our largest consumers reasonable test cases for us to check before releases.

So in summary, fix the things you can fix and that people care about. Be cautious but don't get too hung up on issues we don't know about our can't fix yet.

And I'd say don't add failing tests to the trunk if you don't plan to fix it straight away. I've been caught before unable to make a release foxing other issues, because I hadn't fixed something that I didn't have time to resolve yet.

I hope this helps contribute to your thinking in this area.

Adam On Aug 20, 2015 5:14 AM, "Christian Walde" [email protected] wrote:

@moregan https://github.com/moregan:

Two things you can do:

If you see any commits that are uncontroversial and which i missed, integrate them into that branch. 2.

If you feel like putting some time in, select a test file in the commits after the tag of the last release, use git log file to figure out which commits have added tests to it, then cherry-pick those onto new_tests (while removing in conflict resolution changes to files other than the test script), squash these commits together with an interactive rebase (and in doing so remove remaining changes to other files), run the test script, and mark anything failing as TODO (or if you don't see an easy way to do so, poke me to do it).

I don't quite see how TODO-foo with tests discovers issues any better than requiring a branch to be current with master and passing tests, but that will surely become clear.

I'm not sure how to explain it well. Let me try again. :)

tests0+fix0 -> issue 1 found -> tests1+fix1 -> issue 2 found -> tests2+fix2 -> issue 3 found -> tests3+fix3

And this keeps happening and the issues keep getting more complicated and more difficult to fix. We try to not release with known issues, but since we keep finding new ones and they keep getting more complicated, this means we cannot realistically release at all if we don't accept that we must release with known issues.

Now the question becomes figuring out which issues we can release with and which ones we can not release with.

Look at that chain above again and consider the hypothetical question of: Did issue 3 exist before fix0? Or was it created by fix0, fix1 or fix2?

This question is important because if an issue existed before fix0, then we can safely ignore it and release without needing to fix it immediately, since such a release would not mean releasing with a new bug, only with an old one that's always been in place.

Now consider what happens if we rebase that chain and split it up into this structure:

tests0+tests1+tests2+tests3+todos -> fix0+untodo -> fix1+untodo -> fix2+untodo

If, while applying fix0-2, additional test failures appear, then we know that that fix introduced new bugs and needs to be postponed along with fix3 that would then have been necessary to fix those regressions; if however no additional test failures appear, then we know that fix0-2 are safe and fix3 actually fixed a long-standing bug.

— Reply to this email directly or view it on GitHub https://github.com/adamkennedy/PPI/issues/176#issuecomment-132988489.

adamkennedy avatar Aug 23 '15 14:08 adamkennedy

  1. real world vs. constructed issues
  2. ease people's pain points

Those are very useful reminders, thank you!

  1. consider consumers test suites

I wonder if we should make a tool like Moose has that tests users' suites.

don't add failing tests to the trunk if you don't plan to fix it straight away

I agree, which is why i plan to go with TODO tests. They will fail, but TAP will interpret it as a success, since right now we expect those to fail. They can be perused at a later time for things one might wish to fix when bored; and if something fixes them TAP will speak up about that, so we'll know when a failing test that is TODO becomes solved.

It's also nice to hear that you've been in the same situation i found myself in. :)

That does give me confidence that i'm on the right track.

wchristian avatar Aug 23 '15 19:08 wchristian

If you can, I'd recommend quarantining all todos into their own subdirectory of tests, unless it's a really high priority one that we definitely intend to fix in future.

It's mostly just for organisational cleanlyness

Adam On Aug 23, 2015 12:41 PM, "Christian Walde" [email protected] wrote:

  1. real world vs. constructed issues
  2. ease people's pain points

Those are very useful reminders, thank you!

  1. consider consumers test suites

I wonder if we should make a tool like Moose has that tests users' suites.

don't add failing tests to the trunk if you don't plan to fix it straight away

I agree, which is why i plan to go with TODO tests. They will fail, but TAP will interpret it as a success, since right now we expect those to fail. They can be perused at a later time for things one might wish to fix when bored; and if something fixes them TAP will speak up about that, so we'll know when a failing test that is TODO becomes solved.

It's also nice to hear that you've been in the same situation i found myself in. :)

That does give me confidence that i'm on the right track.

— Reply to this email directly or view it on GitHub https://github.com/adamkennedy/PPI/issues/176#issuecomment-133909437.

adamkennedy avatar Aug 23 '15 23:08 adamkennedy

Moving them is a little high on the effort scale, since the code to trigger TODO exceptions is a little involved ( https://metacpan.org/pod/Test::More#TODO:-BLOCK ).

I also asked #toolchain if there was an easy way to filter them out other then prove -vrl | grep TODO, but they said we'd have to make a custom TAP parser for that.

Luckily prove -vrl | grep TODO is quick and easy to remember.

wchristian avatar Aug 24 '15 01:08 wchristian