pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

Automated security and update routine before every release

Open hpvd opened this issue 4 years ago • 27 comments

Is your enhancement request related to a problem? Please describe. To get the most out of every release regarding security, performance and "bug-freeness" it may be a good idea to make reasonable updating of dependencies a good routine before every release.

Describe the solution you'd like

what would help (if not already used):

  1. enabling GitHubs alerts for vulnerable dependencies for pulsar see https://docs.github.com/en/free-pro-team@latest/github/managing-security-vulnerabilities/about-alerts-for-vulnerable-dependencies

-> if possible a bot automatically should open an issue to fix these findings / update the dependencies as soon as fixes are available

  1. since possible not all vulnerabilities are reported/found it may also be an idea having a dynamic/automated table of dependencies:
    • column 1: name of dependency
    • column 2: versions of dependencies used in the latest pulsar release e.g. see https://frontbackend.com/maven/artifact/org.apache.pulsar/pulsar/2.6.2
    • column 3: latest version of dependency available (if hosted at GitHub: accessible with GitHub API)

-> before every release one should look at this table and update all (most) dependencies to their latest version (or note a hint why this is not possible at this time (e.g. incompatible changes) -> of course one could automate open update issues as well, but these may result for too many intermediate steps between releases

hpvd avatar Dec 03 '20 11:12 hpvd

here you can find a blog post with the anoucement of the availability of automatic code scanning for security https://github.blog/2020-09-30-code-scanning-is-now-available/

hpvd avatar Dec 04 '20 12:12 hpvd

@hpvd thank you for reporting this. We will consider it in our future releases.

sijie avatar Dec 08 '20 22:12 sijie

A new GitHub feature which may also lead to some kind of "security routine" when merging pull requests, was presented at GitHub Universe 2020: "Dependency Review" : From announcement:

Dependency review Today, dependency graph helps you understand your dependencies, and security alerts notify you of newly discovered vulnerabilities in your dependencies. But what if you could receive these alerts before introducing vulnerable code through new or updated dependencies? Dependency review helps reviewers and contributors understand dependency changes and their security impact at every pull request.

https://github.blog/2020-12-08-new-from-universe-2020-dark-mode-github-sponsors-for-companies-and-more/ also https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/reviewing-dependency-changes-in-a-pull-request

hpvd avatar Dec 09 '20 09:12 hpvd

These points could possibly be classified as "low-hanging fruits" in the field of security (at least if they work as expected and there are not to many false positive findings introduced...)

hpvd avatar Dec 09 '20 11:12 hpvd

as a last point to this topic: it may be also interesting to give GitHub's "super linter" a try and let it check the hole project on every release or on every pull via GitHub action... see https://github.com/github/super-linter/

hpvd avatar Dec 09 '20 11:12 hpvd

We use dependency-check-maven Maven Plugin to automate CVE checks against updated DB on used dependencies within build process. It is pretty straightforward.

fmiguelez avatar Jan 21 '21 10:01 fmiguelez

Cool @fmiguelez Would you please push a PR to enable this great plugin? Also, this should be check in the CI to avoid introduce some known CVE issues.

codelipenghui avatar Jan 28 '21 07:01 codelipenghui

Hello guys

We try to certify the pulsar according the few security standards . We scanned the pulsar image 2.7.0 by WhiteSource Unfortunately , 167 high risk CVE have be discovered in the 55 outdated libraries that were marked is High risk vulnerable .

It's "bit" makes our effort to certify the pulsar for the highly secured production environment to be complicated :disappointed:

On the other hand , there is the opened issue about automated security scanning.

Any change to move this issue forward or at least t upgrade the outdated libraries with high risk? Could make significant boost to adoption the pulsar by many security regulated environments

alexku7 avatar Mar 04 '21 18:03 alexku7

many thanks @alexku7 for describing your findings and view in details including the concrete consequence. Imho this is not only an obstacle for "highly secured production environment" but for a not small part of possible production usages. As trying to describe in the issue and it's comments, it's not only about security but also about performance and "bug-freeness" which both potentially saves lots of time in analyzing, allocating and solving problems which may already have been fixed by others... Taking care for this as routine in every release, it should be -after an initial bigger step- a manageable amount of work which is good to catch some of the "low hanging fruits" in a smart way...

hpvd avatar Mar 04 '21 20:03 hpvd

-> Could there be a better advertising for pulsars' awesome quality, than being used directly by people and companies working in highly secured fields ?? :-)

hpvd avatar Mar 04 '21 20:03 hpvd

Yeah these code / dependency / image scanners are pretty harsh but several of our own customers want security reports of all dependent software so any effort to minimize these issues in Pulsar - especially if it's in a maintenance release e.g. 2.6.4 could be extremely valuable. And if there's a documented process to mitigate in a PR then even someone like me could probably do it as it's in our own interests and happy to deliver value to the broader community :-)

frankjkelly avatar Mar 04 '21 20:03 frankjkelly

Of course we have also seen, the major work in fields of security and code quality in the past months (probably coming to live in v2.8), like

  • enabling spotbugs in many components,
  • working on E2E encryption,
  • fixing things resulting in flaky tests
  • etc...

-> this is pretty awesome, and important. Beyond that, this issues is about the routines and automatics making it possible to get most out of all the works put into pulsar.

hpvd avatar Mar 04 '21 20:03 hpvd

@alexku7 would be happy to see the statistics when scanning upcoming v2.8 with same tool (white source)!

hpvd avatar Mar 04 '21 21:03 hpvd

@alexku7 would be happy to see the statistics when scanning upcoming v2.8 with same tool (white source)!

Sure :) no problem I posted the exported report for 2.7.0 in the slack channel . https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1614882939234100

alexku7 avatar Mar 04 '21 21:03 alexku7

There's now #10855 to add a scheduled OWASP Dependency Check to scan library vulnerabilities once per day.

lhotari avatar Jun 07 '21 13:06 lhotari

@lhotari this is great news! Thanks so much!

frankjkelly avatar Jun 07 '21 13:06 frankjkelly

awesome ;-)

hpvd avatar Jun 07 '21 14:06 hpvd

The results of the scheduled OWASP Dependency Check scans can be found here: https://github.com/apache/pulsar/actions/workflows/ci-owasp-dependency-check.yaml

lhotari avatar Jun 08 '21 06:06 lhotari

just another topic for optimizing code quality and security further: Use Automatic Fuzzing to find bugs (e.g. as part of CI / via github action) https://github.com/apache/pulsar/issues/12789

-> with the latest possibilities of integration CI process, this is now relatively easy to use but powerful

hpvd avatar Nov 16 '21 07:11 hpvd

just learned about the github's dependency graph. When looking into it for pulsar, there are

  • 200+ dependencies found
  • many of these are somehow outdated and newer versions are available (e.g. for already some months or even 1 or 2 years)
  • some are forks of others, no more updated (e.g. since May 2015 while "origin" is still maintained https://github.com/cybernetics/hppc/tags vs https://github.com/carrotsearch/hppc/releases/tag/0.9.1 )

dependency graph for pulsar: https://github.com/apache/pulsar/network/dependencies

hpvd avatar Feb 01 '22 22:02 hpvd

just to have a first impression without having to leave this issue:

def number of dependencies
Dependencies defined in pom.xml 170
Dependencies defined in tests/pom.xml 1
Dependencies defined in docker/pom.xml 1
Dependencies defined in pulsar-io/pom.xml 1
Dependencies defined in testmocks/pom.xml 7
Dependencies defined in buildtools/pom.xml 20
Dependencies defined in pulsar-sql/pom.xml 19
Dependencies defined in distribution/pom.xml 1
Dependencies defined in pulsar-proxy/pom.xml 23
Dependencies defined in …/pulsar/pom.xml 5
Dependencies defined in pulsar-broker/pom.xml 49
Dependencies defined in pulsar-client/pom.xml 24
Dependencies defined in pulsar-common/pom.xml 33
Dependencies defined in …/website/package.json 16
Dependencies defined in jclouds-shaded/pom.xml 3
Dependencies defined in managed-ledger/pom.xml 14
... ...
... ...
... ...

hpvd avatar Feb 01 '22 22:02 hpvd

With this high number of dependencies of all kinds and different ages the main question that is bothering me:

=> Is it enough (or a least the best thing we could do at this time) if only the dependencies with already well known/reported security issues are identified and updated? like addressed: https://github.com/apache/pulsar/pull/13972 (which is great of course!!)

-> a) Or is there a big risk of sacrificing security, performance and bug-freeness we didn't see yet (see goal of this issue https://github.com/apache/pulsar/issues/8815#issue-756101012) resulting from some of the other dependencies (with no yet reported security risks) for which there are also already updates available (sometimes for a long time)?

-> b) How can we be sure that every dependency, introduced several years ago, is still in use / really needed in todays pulsar?

hpvd avatar Feb 02 '22 11:02 hpvd

just to show numbers are constantly growing (yes this is no statistic ;-) only good to transport the feeling...) from yesterday to today: one more dependency was introduced

def number of dependencies on 01 Feb 2022 number of dependencies on 02 Feb 2022
Dependencies defined in pom.xml 170 171

hpvd avatar Feb 02 '22 18:02 hpvd

With this high number of dependencies of all kinds and different ages the main question that is bothering me:

=> Is it enough (or a least the best thing we could do at this time) if only the dependencies with already well known/reported security issues are identified and updated? like addressed: #13972 (which is great of course!!)

-> a) Or is there a big risk of sacrificing security, performance and bug-freeness we didn't see yet (see goal of this issue #8815 (comment)) resulting from some of the other dependencies (with no yet reported security risks) for which there are also already updates available (sometimes for a long time)?

-> b) How can we be sure that every dependency, introduced several years ago, is still in use / really needed in todays pulsar?

Very good questions.

@nicoloboschi and @dlg99 from DataStax have been contributing many changes to address vulnerable library versions. DataStax has bought a license for Sonatype IQ Server and scans also Apache Pulsar frequently.

Another aspect in the Software Supply Chain security is the build reproducibility: are the built artifacts built from the source code that it claims to be built from. For Java projects, there's more information in https://reproducible-builds.org/docs/jvm/ and https://github.com/jvm-repo-rebuild/reproducible-central . It would be good to get Apache Pulsar as part of the Reproducible Builds program. Reproducible Builds have been discussed a few times.

@hpvd Since the mailing list is the main channel for making major decisions in Apache projects, it would be useful to bring up your improvement suggestions to the Apache Pulsar community. [email protected] would be a good list to have this discussion. Mailing list details are at https://pulsar.apache.org/en/contact/ .

lhotari avatar Feb 02 '22 21:02 lhotari

many thanks for your answer, additional details and advice! Will bring some points to the list within the next weeks...

btw: does anybody look on pulsar with a tool like jarchitect to keep a good overview over dependencies? sounds interesting/helpful to me:

dependency graphs etc https://www.jarchitect.com/JArchitectv2020

JArchitect comes with several facilities that allow the efficient dependency management. In seconds you can know which part of the code will be impacted if you refactor a class, you can be advised if a layer dependency violation has been accidentally created, you can pinpoint precisely which part of the code relies on a particular tier component, you can list methods that can be reached from a given method etc…

edit: deactivated active link edit2: there seems to be a trial: The trial license is fully featured, but time limited (14-day free trial.)

hpvd avatar Feb 04 '22 12:02 hpvd

another interesting topic in this field of automatic security scanning: Automatic Scan for CWEs (additional to CVEs) https://github.com/apache/pulsar/issues/17069

hpvd avatar Aug 11 '22 15:08 hpvd

just to visualize/summarize the current state: our current procedure/routine seems to miss 35 fixable vulnerabilities (CVE) when releasing latest version 2.10.2

okay, a (very) few less if

  • not all were public known on release day last week
  • we do not want to fix all (why??)
  • or can't fix all immediately because of really major changes in how dependencies work which took some more time to be adapted...

for details see https://github.com/apache/pulsar/issues/18348

hpvd avatar Nov 04 '22 16:11 hpvd

Moved to the open-ended discussion forum.

I suggest you directly send patches and the maintainers will be glad to review them. Keep requesting helps little: Open-source software grows with contributions.

tisonkun avatar Dec 28 '22 14:12 tisonkun