pulsar
pulsar copied to clipboard
Automated security and update routine before every release
Is your enhancement request related to a problem? Please describe. To get the most out of every release regarding security, performance and "bug-freeness" it may be a good idea to make reasonable updating of dependencies a good routine before every release.
Describe the solution you'd like
what would help (if not already used):
- enabling GitHubs alerts for vulnerable dependencies for pulsar see https://docs.github.com/en/free-pro-team@latest/github/managing-security-vulnerabilities/about-alerts-for-vulnerable-dependencies
-> if possible a bot automatically should open an issue to fix these findings / update the dependencies as soon as fixes are available
- since possible not all vulnerabilities are reported/found it may also be an idea having a dynamic/automated table of dependencies:
- column 1: name of dependency
- column 2: versions of dependencies used in the latest pulsar release e.g. see https://frontbackend.com/maven/artifact/org.apache.pulsar/pulsar/2.6.2
- column 3: latest version of dependency available (if hosted at GitHub: accessible with GitHub API)
-> before every release one should look at this table and update all (most) dependencies to their latest version (or note a hint why this is not possible at this time (e.g. incompatible changes) -> of course one could automate open update issues as well, but these may result for too many intermediate steps between releases
here you can find a blog post with the anoucement of the availability of automatic code scanning for security https://github.blog/2020-09-30-code-scanning-is-now-available/
@hpvd thank you for reporting this. We will consider it in our future releases.
A new GitHub feature which may also lead to some kind of "security routine" when merging pull requests, was presented at GitHub Universe 2020: "Dependency Review" : From announcement:
Dependency review Today, dependency graph helps you understand your dependencies, and security alerts notify you of newly discovered vulnerabilities in your dependencies. But what if you could receive these alerts before introducing vulnerable code through new or updated dependencies? Dependency review helps reviewers and contributors understand dependency changes and their security impact at every pull request.
https://github.blog/2020-12-08-new-from-universe-2020-dark-mode-github-sponsors-for-companies-and-more/ also https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/reviewing-dependency-changes-in-a-pull-request
These points could possibly be classified as "low-hanging fruits" in the field of security (at least if they work as expected and there are not to many false positive findings introduced...)
as a last point to this topic: it may be also interesting to give GitHub's "super linter" a try and let it check the hole project on every release or on every pull via GitHub action... see https://github.com/github/super-linter/
We use dependency-check-maven Maven Plugin to automate CVE checks against updated DB on used dependencies within build process. It is pretty straightforward.
Cool @fmiguelez Would you please push a PR to enable this great plugin? Also, this should be check in the CI to avoid introduce some known CVE issues.
Hello guys
We try to certify the pulsar according the few security standards . We scanned the pulsar image 2.7.0 by WhiteSource Unfortunately , 167 high risk CVE have be discovered in the 55 outdated libraries that were marked is High risk vulnerable .
It's "bit" makes our effort to certify the pulsar for the highly secured production environment to be complicated :disappointed:
On the other hand , there is the opened issue about automated security scanning.
Any change to move this issue forward or at least t upgrade the outdated libraries with high risk? Could make significant boost to adoption the pulsar by many security regulated environments
many thanks @alexku7 for describing your findings and view in details including the concrete consequence. Imho this is not only an obstacle for "highly secured production environment" but for a not small part of possible production usages. As trying to describe in the issue and it's comments, it's not only about security but also about performance and "bug-freeness" which both potentially saves lots of time in analyzing, allocating and solving problems which may already have been fixed by others... Taking care for this as routine in every release, it should be -after an initial bigger step- a manageable amount of work which is good to catch some of the "low hanging fruits" in a smart way...
-> Could there be a better advertising for pulsars' awesome quality, than being used directly by people and companies working in highly secured fields ?? :-)
Yeah these code / dependency / image scanners are pretty harsh but several of our own customers want security reports of all dependent software so any effort to minimize these issues in Pulsar - especially if it's in a maintenance release e.g. 2.6.4
could be extremely valuable. And if there's a documented process to mitigate in a PR then even someone like me could probably do it as it's in our own interests and happy to deliver value to the broader community :-)
Of course we have also seen, the major work in fields of security and code quality in the past months (probably coming to live in v2.8), like
- enabling spotbugs in many components,
- working on E2E encryption,
- fixing things resulting in flaky tests
- etc...
-> this is pretty awesome, and important. Beyond that, this issues is about the routines and automatics making it possible to get most out of all the works put into pulsar.
@alexku7 would be happy to see the statistics when scanning upcoming v2.8 with same tool (white source)!
@alexku7 would be happy to see the statistics when scanning upcoming v2.8 with same tool (white source)!
Sure :) no problem I posted the exported report for 2.7.0 in the slack channel . https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1614882939234100
There's now #10855 to add a scheduled OWASP Dependency Check to scan library vulnerabilities once per day.
@lhotari this is great news! Thanks so much!
awesome ;-)
The results of the scheduled OWASP Dependency Check scans can be found here: https://github.com/apache/pulsar/actions/workflows/ci-owasp-dependency-check.yaml
just another topic for optimizing code quality and security further: Use Automatic Fuzzing to find bugs (e.g. as part of CI / via github action) https://github.com/apache/pulsar/issues/12789
-> with the latest possibilities of integration CI process, this is now relatively easy to use but powerful
just learned about the github's dependency graph. When looking into it for pulsar, there are
- 200+ dependencies found
- many of these are somehow outdated and newer versions are available (e.g. for already some months or even 1 or 2 years)
- some are forks of others, no more updated (e.g. since May 2015 while "origin" is still maintained https://github.com/cybernetics/hppc/tags vs https://github.com/carrotsearch/hppc/releases/tag/0.9.1 )
dependency graph for pulsar: https://github.com/apache/pulsar/network/dependencies
just to have a first impression without having to leave this issue:
def | number of dependencies |
---|---|
Dependencies defined in pom.xml | 170 |
Dependencies defined in tests/pom.xml | 1 |
Dependencies defined in docker/pom.xml | 1 |
Dependencies defined in pulsar-io/pom.xml | 1 |
Dependencies defined in testmocks/pom.xml | 7 |
Dependencies defined in buildtools/pom.xml | 20 |
Dependencies defined in pulsar-sql/pom.xml | 19 |
Dependencies defined in distribution/pom.xml | 1 |
Dependencies defined in pulsar-proxy/pom.xml | 23 |
Dependencies defined in …/pulsar/pom.xml | 5 |
Dependencies defined in pulsar-broker/pom.xml | 49 |
Dependencies defined in pulsar-client/pom.xml | 24 |
Dependencies defined in pulsar-common/pom.xml | 33 |
Dependencies defined in …/website/package.json | 16 |
Dependencies defined in jclouds-shaded/pom.xml | 3 |
Dependencies defined in managed-ledger/pom.xml | 14 |
... | ... |
... | ... |
... | ... |
With this high number of dependencies of all kinds and different ages the main question that is bothering me:
=> Is it enough (or a least the best thing we could do at this time) if only the dependencies with already well known/reported security issues are identified and updated? like addressed: https://github.com/apache/pulsar/pull/13972 (which is great of course!!)
-> a) Or is there a big risk of sacrificing security, performance and bug-freeness we didn't see yet (see goal of this issue https://github.com/apache/pulsar/issues/8815#issue-756101012) resulting from some of the other dependencies (with no yet reported security risks) for which there are also already updates available (sometimes for a long time)?
-> b) How can we be sure that every dependency, introduced several years ago, is still in use / really needed in todays pulsar?
just to show numbers are constantly growing (yes this is no statistic ;-) only good to transport the feeling...) from yesterday to today: one more dependency was introduced
def | number of dependencies on 01 Feb 2022 | number of dependencies on 02 Feb 2022 |
---|---|---|
Dependencies defined in pom.xml | 170 | 171 |
With this high number of dependencies of all kinds and different ages the main question that is bothering me:
=> Is it enough (or a least the best thing we could do at this time) if only the dependencies with already well known/reported security issues are identified and updated? like addressed: #13972 (which is great of course!!)
-> a) Or is there a big risk of sacrificing security, performance and bug-freeness we didn't see yet (see goal of this issue #8815 (comment)) resulting from some of the other dependencies (with no yet reported security risks) for which there are also already updates available (sometimes for a long time)?
-> b) How can we be sure that every dependency, introduced several years ago, is still in use / really needed in todays pulsar?
Very good questions.
@nicoloboschi and @dlg99 from DataStax have been contributing many changes to address vulnerable library versions. DataStax has bought a license for Sonatype IQ Server and scans also Apache Pulsar frequently.
Another aspect in the Software Supply Chain security is the build reproducibility: are the built artifacts built from the source code that it claims to be built from. For Java projects, there's more information in https://reproducible-builds.org/docs/jvm/ and https://github.com/jvm-repo-rebuild/reproducible-central . It would be good to get Apache Pulsar as part of the Reproducible Builds program. Reproducible Builds have been discussed a few times.
@hpvd Since the mailing list is the main channel for making major decisions in Apache projects, it would be useful to bring up your improvement suggestions to the Apache Pulsar community. [email protected] would be a good list to have this discussion. Mailing list details are at https://pulsar.apache.org/en/contact/ .
many thanks for your answer, additional details and advice! Will bring some points to the list within the next weeks...
btw: does anybody look on pulsar with a tool like jarchitect to keep a good overview over dependencies? sounds interesting/helpful to me:
dependency graphs etc
https://www.jarchitect.com/JArchitectv2020
JArchitect comes with several facilities that allow the efficient dependency management. In seconds you can know which part of the code will be impacted if you refactor a class, you can be advised if a layer dependency violation has been accidentally created, you can pinpoint precisely which part of the code relies on a particular tier component, you can list methods that can be reached from a given method etc…
edit: deactivated active link edit2: there seems to be a trial: The trial license is fully featured, but time limited (14-day free trial.)
another interesting topic in this field of automatic security scanning: Automatic Scan for CWEs (additional to CVEs) https://github.com/apache/pulsar/issues/17069
just to visualize/summarize the current state: our current procedure/routine seems to miss 35 fixable vulnerabilities (CVE) when releasing latest version 2.10.2
okay, a (very) few less if
- not all were public known on release day last week
- we do not want to fix all (why??)
- or can't fix all immediately because of really major changes in how dependencies work which took some more time to be adapted...
for details see https://github.com/apache/pulsar/issues/18348
Moved to the open-ended discussion forum.
I suggest you directly send patches and the maintainers will be glad to review them. Keep requesting helps little: Open-source software grows with contributions.