citgm
citgm copied to clipboard
Proposal: Lock module version in lookup.json
As we test more modules in CitGM, triaging failures does not seem to be scaling.
With any failure we need to work out whether it was caused by:
- the change in Node we're testing
- the module/module test suite being updated
- flaky tests
To make life easier I would suggest that we:
- Lock the modules to specific (known working) versions in the
lookup.json
(with appropriate flaky tags if necessary). - Allow module authors to raise PRs to update the versions (when they feel like it). We can update modules ourselves in batches too (perhaps at weekly or monthly intervals).
This means that instead of constantly having to stay on top of all of the module/modules test suite updates and regularly having to mark some of these tests as flaky, we could do this in batches at a regular interval.
Yeah, moving targets are never fun. I'm +1 on this.
@MylesBorins I'd be interested to know your thoughts on this.
I'm +1 on this. If we can encourage module authors to update their modules here then that would be an added bonus.
I'm still very -1 on this.
The whole point of citgm is to find failures
We have no clear mechanism to upgrade modules, or know when they need to be upgraded. It is inconvenient when they break, but isn't that kind of the point?
On May 12, 2017 11:46 AM, "Gibson Fahnestock" [email protected] wrote:
@MylesBorins https://github.com/mylesborins I'd be interested to know your thoughts on this.
I'm +1 on this. If we can encourage module authors to update their modules here then that would be an added bonus.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/citgm/issues/407#issuecomment-301113120, or mute the thread https://github.com/notifications/unsubscribe-auth/AAecV8NF0IiGSvaTSebZHUK1hbivdbYoks5r5H7WgaJpZM4NZcso .
@MylesBorins ... how would you feel about an approach that used two lookup tables in CI... one with a last-known-good configuration of modules, and one with the current-set? Running differentials between those could help us track down regressions quite easily.
Tbqh I'm still not sure that is super useful, and it sounds like quite a bit of work to implement / maintain. We should likely focus our efforts on finding out why our CI infra has false negatives that don't exist on other machines
On May 12, 2017 4:23 PM, "James M Snell" [email protected] wrote:
@MylesBorins https://github.com/mylesborins ... how would you feel about an approach that used two lookup tables in CI... one with a last-known-good configuration of modules, and one with the current-set? Running differentials between those could help us track down regressions quite easily.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/citgm/issues/407#issuecomment-301177055, or mute the thread https://github.com/notifications/unsubscribe-auth/AAecV9zvBpw0vpx5qCJtPRJgdgpN4wVfks5r5L_QgaJpZM4NZcso .
I am kind of with @MylesBorins on this, the whole point of citgm is that we catch module failures as soon as they are released, I agree that perhaps having two lookup tables could work though but it could get messy as you may have different versions being compatible with different platforms ?
OK so I'm thinking about this a bit more and thinking specifically about the high level problem... people don't want to use CITGM because of the number of false negatives, and the fact that it is not intuitive to read the results
In the past I had been more on top of keeping the flakyness of the lookup table up to date. I would run CITGM on each release line, review the results, and update the lookup as appropriate. An obvious solution would be to make this a task that people sign up for... (or possibly an expected part of using CITGM if you experience flakyness).
Another problem is knowing if something is truly flaky. @gdams had started work on a stress test feature for CITGM that would make it easier for us to run a module multiple times to figure out just how flaky they were.
Perhaps we can even find a way to automate the above process, or make it far easier to verify if a module is flaky or now (multiple failures auto updates the lookup for example).
@BethGriggs thanks for bringing this up! Obviously there is push back from the collaborators on using this tool regularly and we should definitely work on figuring out how to make it more user friendly
The whole point of citgm is to find failures
the whole point of citgm is that we catch module failures as soon as they are released
I disagree pretty strongly with this. The whole point of CitGM is to smoke-test whether Node.js core changes break the community. The purpose of CitGM isn't to fix module issues, the modules we test have their own CI.
It is inconvenient when they break, but isn't that kind of the point?
CitGM is not useful unless it's green. I don't think most breaking changes will be discovered only on the absolute latest module versions, so being a month or two (at most) behind the latest module should be fine (if anything older module versions probably use more deprecated features).
and it sounds like quite a bit of work to implement / maintain.
Implementing should just be a version in the lookup.json
, maintaining means running with the latest versions and PRing said update. We could have a tool that auto-updates the lookup with the latest version if it passes.
We should likely focus our efforts on finding out why our CI infra has false negatives that don't exist on other machines
Sure, but we have to do that anyway, this is just one thing we can do to try to insulate people running CitGM on their PRs in core from having to deal with transient module issues.
What I tend to see is someone running CitGM, getting a bunch of failures, and then pinging @nodejs/citgm to ask whether they're expected. And I don't think we've been very good at answering those people.
IMO the possible reasons we might not want to do this are:
- Updating the versions will be more effort than fixing issues
- Locking the versions means that CitGM stays green while we resolve any issues.
- There aren't enough module issues for it to be worth it
- I'd say there are.
I disagree pretty strongly with this. The whole point of CitGM is to smoke-test whether Node.js core changes break the community. The purpose of CitGM isn't to fix module issues, the modules we test have their own CI.
When I run CITGM on a release I care that we are testing against what people will get when they run npm install
. That is whatever the latest version of a module is. If it is broken, it is broken
When I run CITGM on a release I care that we are testing against what people will get when they run npm install. That is whatever the latest version of a module is.
I would assume the vast majority of installs come from a module range specified in a package.json
. Have any of the issues that CitGM has turned up come from the most recent version of a module?
If it is broken, it is broken
Okay, but if it is broken we just skip it in the lookup. It's not like we hold the release until all the modules are fixed.
If we had a mechanism to automatically update the lookup, raise a PR, and run CI on the latest release (e.g. v7.10.0
right now) every week, would that ease the issue? It could even be triggered before a release.
@gibfahn I think we can definitely introduce a more aggressive policy for skipping modules. Technically at this point any collaborator can make changes to the lookup without requiring any sign off.
What we really need is regular audits of the release lines with CITGM, where we ignore flakes and keep the lookup up to date. This could potentially be automated
https://github.com/nodejs/citgm/pull/321 is a start to keeping our lookup up to date
IMHO multiple tables makes sense. A sparse one for the general case, and a locked version one for each release line. This will allow us to differentiate our regressions and modules' regressions.
A version specific lookup.json
could sit in https://github.com/nodejs/node/tree/master/test and get PRs that get peer reviewed and semantic versioning...
I'm a huge -1 on multiple tables. lookup is the baseline and it is what we run off of. A separate table does not solve my concerns above regarding not keeping up to date... it simply doesn't enforce them in the main citgm.
I think the wiki has been a really good first take fwiw
I think the wiki has been a really good first take fwiw
Thanks! The only caveat is that you still need to know what's what, so I get people who ask me "is my CITGM run Ok?" then I essentially do a second manual lookup filter (the difference is I have version context).
I personally think that CITGM would have a much higher value if it would be green in average even if the versions tested are not all up to date. The point is that they should still reflect most of the user basis even if they are older. Let us get more modules in and have a better guarantee because of that and not because of the newest version.
We should use a lookup table for the newest versions though e.g. once each two month. Ideally CITGM could use something like greenkeeper.io but I do not see a trivial way to implement that. Even though it would also not help against flaky test suites.
I agree we need to figure out how to let more people run this usefully. To be useful it needs to be green most of the time when people need/want to run it. It would be interesting to see if a version with fixed module versions would be more stable or not.
Running something like a "stable" and "canary" where new versions of modules get promoted from "canary" to "stable" once they have passed for some period of time is a model that might balance the need to test new versions of modules while also allowing it to be more broadly used.
I once again am going to push back on this.
As the person who is doing the majority of the work with CITGM, the majority of the updates to the lookup.json and is one of the primary people using the tool day to day. I am not convinced that we would keep the lookup table up to date and am concerned that it would fall back on me to maintain it.
We do get failures, and it is a mixed bag of reasons.
Currently the majority of failures have been due to flakes or infra issues on our side. In fact, the 6.x line is for the most part green on citgm.
All of that being said. If someone is willing to do the work to implement installing specific versions I am willing to give it a try to see if it improves things, landing commits and testing workflow is not a massive burden. One thing to keep in mind is that we should like support tagged Majors only as a non trivial number of npm modules are installed to auto-update to the latest minor
I am not convinced that we would keep the lookup table up to date and am concerned that it would fall back on me to maintain it.
Yeah, this is definitely the core issue. I would say that I think the maintenance burden is likely to be the same either way (modules break pretty frequently in my experience), especially if we could get some automation.
However we might also fall into a false sense of security, where we think everything's fine because some ancient version of a module is green, but actually latest is broken (and we just never update the lookup version because the latest one is broken).
Alternative proposal
How about we do something more on the lines of:
Have an 'lkgr' version in the lookup. First run tests on latest, and if the tests fail there then fall back to the lkgr
. That way we should be able to quickly see that
- Fail, Pass => Module broken by module update
- Fail, Fail => Module broken by node update
Obviously it won't work if the module is flaky, but nothing works if the module is flaky, so :man_shrugging: .
I like the proposal of making this a fall back rather than a default. How would you propose we keep the lookup updated? How do we handle lkgr being different across release lines and platforms?
On Oct 24, 2017 11:45 AM, "Gibson Fahnestock" [email protected] wrote:
I am not convinced that we would keep the lookup table up to date and am concerned that it would fall back on me to maintain it.
Yeah, this is definitely the core issue. I would say that I think the maintenance burden is likely to be the same either way (modules break pretty frequently in my experience), especially if we could get some automation.
However we might also fall into a false sense of security, where we think everything's fine because some ancient version of a module is green, but actually latest is broken (and we just never update the lookup version because the latest one is broken). Alternative proposal
How about we do something more on the lines of:
Have an 'lkgr' version in the lookup. First run tests on latest, and if the tests fail there then fall back to the lkgr. That way we should be able to quickly see that
- Fail, Pass => Module broken by module update
- Fail, Fail => Module broken by node update
Obviously it won't work if the module is flaky, but nothing works if the module is flaky, so 🤷♂️ .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/citgm/issues/407#issuecomment-339036032, or mute the thread https://github.com/notifications/unsubscribe-auth/AAecV_hNuuU_X8qSXF3UQE3e3kuooWy0ks5svgYagaJpZM4NZcso .
would you propose we keep the lookup updated?
Ideally we'd have
How do we handle lkgr being different across release lines and platforms I can definitely see that snowballing out of control. I think for platforms we should just control it with "lkgr for all non-flaky platforms".
For release lines we might need multiple versions sometimes.
For release lines we might need multiple versions sometimes.
this leaves me with a feeling that keeping track of this stuff is going to explode in complexity
Proposal:
- Leave CITGM as it is now to default to getting version from npm.
- Add an option
--run-lkgr
:- this would use
lkgr
version from the lookup instead of getting it from npm
- this would use
- Add an option
--update-lkgr
:- check if the latest version from npm is the same as
lkgr
in the lookup: same version -> no need to run test new version -> if module succeeds then updatelkgr
, else if the module fails save the name - Raise a PR with all changes to the lookup and print a list of failing modules.
- check if the latest version from npm is the same as
1.
We would care about:
testing against what people will get when they run npm install. That is whatever the latest version of a module is.
2.
We should see citgm green while doing the release if node is not breaking any module, else this would be an indicator of an potential issue with node.
3.
If newer version of the module fails on the same version of node as the older version of the same module passes -> this would be an indicator of a problem with a module.
There is much more to consider (lkgr
per platform/node version, etc.), but having cigm green when doing the release would be ideal.
So fwiw I'm still not 100% sold on this, but I really like the approach. The LKGR approach is particularly interested, but I do fear the complexity of adding platforms / versions and the lookup exploding in size.
One idea I was kicking around was updating the CI job to run both the No-Build job and the Latest job and compare the differences.
TBH most of the failures we see are flakes / timeouts that are specific to infra... it is really hard to lock those down