cypht feat(workflow): enforce coverage

🍰 Pullrequest

enforce coverage

Enforce a certain percentage of coverage. If this requirement is not met the test fails.

You can see it fail here, as the requirement is set to 100%.

You can see it succeed here.

This PR also fixes broken unit tests.

Issues

As the author of phpunit does not recognize the requirement https://github.com/sebastianbergmann/phpunit/issues/1877#issuecomment-268761134 additional scripts seem to be needed.
Use phpunit-coverage-check instead

Todo

[ ] Unit tests are not enforced to pass before merge Example:
[x] Discovered ~~flaky~~ broken tests - fixed in https://github.com/cypht-org/cypht/pull/1709/commits/118f8f143affee426268792bd268d320a28c2ce4

Sep 26 '25 17:09 ulfgebhardt

@marclaporte Please have a look. I am done here.

As this fixes the unit tests I would like to call priority on this. If you have doubts about the coverage enforcement, I would separate the fix from the feature.

Also I would like to have a talk about enforcing some workflows, so broken tests can't happen anymore and the coverage does not dwindle further.

Sep 26 '25 19:09 ulfgebhardt

@marclaporte Please have a look. I am done here.

As this fixes the unit tests I would like to call priority on this. If you have doubts about the coverage enforcement, I would separate the fix from the feature.

Also I would like to have a talk about enforcing some workflows, so broken tests can't happen anymore and the coverage does not dwindle further.

Our coverage is not good and enforcing a percent will further deepen the problem with tests - it will require developers to add tests just for the sake of coverage instead of thinking what really should be tested. Imagine adding some simple code that drops the coverage from 70.5% to 69.5%. You are forced to add some tests to cover the 70% threshold but the code you added is simple and doesn't need testing. You are forced to either add tests for other code (which done in the same MR has a number of issues by itself) or add tests to cover your code which is sometimes counter-productive. Having too many tests is also not something desirable - we have more and more code to maintain and having many tests doesn't mean they are quality tests - i.e. they are really catching bugs when they should.

Thus, I am not a fan of coverage reports and am against enforcing any percent here. Instead, we should think about critical or complicated code blocks that should be tested. We can add a requirement for adding proper unit or integration tests for any new feature added or major refactoring. We can and should enforce passing the test suite before merging but this should be done when the tests are stable - recently, we have had too many intermittent selenium test failures, for example. Overall, a common sense should be used when thinking about what items to test and what tests to add - a digit percentage is not suitable enough to make this decision.

Sep 29 '25 05:09 kroky

@marclaporte Please have a look. I am done here. As this fixes the unit tests I would like to call priority on this. If you have doubts about the coverage enforcement, I would separate the fix from the feature. Also I would like to have a talk about enforcing some workflows, so broken tests can't happen anymore and the coverage does not dwindle further.

Our coverage is not good and enforcing a percent will further deepen the problem with tests - it will require developers to add tests just for the sake of coverage instead of thinking what really should be tested. Imagine adding some simple code that drops the coverage from 70.5% to 69.5%. You are forced to add some tests to cover the 70% threshold but the code you added is simple and doesn't need testing. You are forced to either add tests for other code (which done in the same MR has a number of issues by itself) or add tests to cover your code which is sometimes counter-productive. Having too many tests is also not something desirable - we have more and more code to maintain and having many tests doesn't mean they are quality tests - i.e. they are really catching bugs when they should.

Thus, I am not a fan of coverage reports and am against enforcing any percent here. Instead, we should think about critical or complicated code blocks that should be tested. We can add a requirement for adding proper unit or integration tests for any new feature added or major refactoring. We can and should enforce passing the test suite before merging but this should be done when the tests are stable - recently, we have had too many intermittent selenium test failures, for example. Overall, a common sense should be used when thinking about what items to test and what tests to add - a digit percentage is not suitable enough to make this decision.

I disagree on all points with you here.

Forcing Developers to write tests is good to ensure they also learn and cover this part of software development.
Even a Snapshot render test can proof useful for other developers as they see that something changed where they did not expect it
Unit tests are very cheap (compared to integration & fullstack tests)
The projects I work with all use this mechanic and all have coverage of 90%+ consistently.
I can also observe when implementing this mechanic and starting even with a low coverage like 5-10% it will lead to increased coverage over time
You can always lower the coverage requirement

Sep 29 '25 13:09 ulfgebhardt

Instead, we should think about critical or complicated code blocks that should be tested.

Is there a way to help detect this or it's pretty much based on the feedback of experienced developers?

Sep 29 '25 15:09 marclaporte

You think that quantitative measure is good for ensuring a quality test suite? Forcing developers write tests will lead to bloatware if developers are not careful enough and if they are - they will already write quality tests - then, why do we force them? Sorry but I don't see the point here.

Sep 29 '25 15:09 kroky

@ulfgebhardt have you actually compared the tests we have against the tons of module code we have - I am not even including the complicated JS we have? I think the actual "coverage" of the code is closer to 30% than 78% at the moment just from common judgement based on the fact that:

I have worked on cypht's codebase in the past several years
I have done pretty major refactorings and have reviewed most of the testing code

This percentage enforcement is very misleading.

Sep 29 '25 15:09 kroky

Instead, we should think about critical or complicated code blocks that should be tested.

Is there a way to help detect this or it's pretty much based on the feedback of experienced developers?

I am not sure there is an automated way to detect this but we can establish strong quidelines and make PR reviewers check them besides the constributors.

Sep 29 '25 15:09 kroky

@kroky I did not compare what is covered and what not - and its not my task to do this. There might be portions of the software which are not covered (e.g. all javascript as this are php unit tests) and not detected as to be covered, but that is something you would notice during development and change rather then me actively checking this with an unknown codebase.

My method to verify it works: Coverall (a tool you already use) reports 80%, this PR shows 78%, which is sufficiently close in my opinion.

This also gives the developer new tools for discovering uncovered code. Here is an example from a different project, but this can also be made available to developers in this project if #1710 is solved.

See overview what files are covered and how much

Detect uncovered lines and branches:

quoting @kroky

I think the actual "coverage" of the code is closer to 30% than 78% at the moment just from common judgement based on the fact that:

I have worked on cypht's codebase in the past several years I have done pretty major refactorings and have reviewed most of the testing code

I do not doubt your competence regarding cypht. I just want to provide a common solution to a problem the project currently faces and I happen to know that this method has achieved its goal on other repositories after being implemented.

The metric of a "good review" is obviously flawed, as you have broken unit tests in your master and coverage has decreased over time from 95% to now 78% (assuming the number is actually correct).

This approach gives a structural solution to exactly this problem; it is the standard all around github; it is proven to work and makes software better and more reliable.

And while I understand that it is very annoying in the beginning to be forced to write tests, it is how it is done. I started with the same mindset, but that has changed - now I actually like ensuring things work with tests and I often discover flaws and shortcomings or edge cases I have not thought about when implementing the tests.

I propose this, because the software currently requires a lot of manual testing - which I am doing and am reported to be one of the few - discovering a lot of flaws.

This solves this problem so I am no longer needed and software quality increases.

Sep 29 '25 17:09 ulfgebhardt

@ulfgebhardt thanks for your notes and advises! I appreciate you taking the time to do all this manual testing, reporting bugs and dealing with consequences. I can assure you there are others doing the same. Major part of the current situation with broken tests in master and dealing with a lot of bugs (which, btw, are not caught by the existing unit tests) is the lack of reviews that Cypht used to have for at least a year. I have requested to do a review of all code merged in master for a month or so in order to make it stable again and release stable versions. Things are progressing smoothly but maybe more slowly than we anticipated.

I can give you several PR request examples that contained tests that would have increased the coverage but which tests actually tested almost nothing - they contained copy-pasted chunks of code from the modules and tested code running in the tests themselves rather than the module code. This needed a manual review and advice from a more senior developer, so that tests become useful and actually contribute to the coverage. Additionally, the developers gained experience and learned how to deal with such situations in the future, so this manual review resulted in many benefits and I can't consider it as "flawed". My main point here is that numeric percentage is not the one we need here to deal with the current situation. It might be harmful and misleading if we have to deal with sloppy written tests. That's why I am against forcing it.

However, I would definitely want to:

add a requirement for unit and integration tests to pass before being able to merge code in master
show code coverage report in the CI and warn/advise developer to consider updating their code if the coverage drops. It will at least be a nice reminder to add tests to something you have forgotten to do
establish commit and code quality guidelines that includes do's and don'ts for automated tests

Sep 30 '25 11:09 kroky

An interesting article illustrating my points: https://stackoverflow.blog/2025/09/29/making-your-code-base-better-will-make-your-code-coverage-worse/

Oct 01 '25 05:10 kroky

I have pushed a fix for the tests, so they now pass in master. I have also added a restriction for merge requests to have latest master, pass the checks and resolve the conversations before merging. Next thing is to add a visual representation of the coverage percent as a check and make it show to the developer.

Oct 03 '25 10:10 kroky