Update page on Shift security left
See related #516.
[!IMPORTANT] Front end topics are out of scope at this point When the book expands to show builds for more than JVM platform languages, that would be a good time to revisit and consider the principles for the JS ecosystem.
Acceptance criteria
- Address questions raised for each of Dan's remarks below, make a decision and create a card or make the change as part of this card
Context
Remarks from a security expert (Dan Wallach):
To my mind, the biggest way to shift security left is to enumerate all the relevant security problems you might face and make sure that you have a plan for each one. So, if you're doing a web microservice, you should have a checklist of sorts, for example:
- [x] Contact Dan Wallach after we're OK with edits, and ask him to nicely review.
- [ ] Second round of review with Dan
From second review with Dan
Right now, you have coverage, to some degree, of three topics:
- Managing secrets, which shouldn't be in your repo
- Managing security issues with your dependencies
- Having some kind of security linter
Some feedback on these:
-
For managing secrets, this is a topic that's much broader than Java. Github has a way of stuffing things into your environment variables, which is great for Github Actions, but what if you're using something else? This is particularly important for things like API keys that you might be using for testing.
-
For dependencies, you need broader advice about dependency management. This is also a topic that's way bigger than just Java. Like, don't just take a dependency because it solves a problem for you today. Is it actively maintained? Does it have a community around it? Does it inhale a ton of external dependencies? One notable thing about Java is that many libraries brag about being zero dependency. This is a feature, in some ways, but it also means that gluing these libraries together can be complicated when they each have their own way of going beyond the Java core libraries' limitations. (This is where I digress and grumble that VAVR is now abandoned by its developer, and the alternatives aren't as good.)
-
For a security and general-purpose linter, I'm a fan of Google's ErrorProne. It's used daily by their entire organization, so the rules have a habit of being reasonable.
Topics you might add:
-
A broad overview of software security, in general, and the benefits of using Java over C/C++ (memory safety) and even Python or JavaScript (static typing vs. dynamic typing). Finding bugs sooner = better software.
-
Leveraging the Java type system for correctness and security. For example, if you want something to be read-only, then you can use class interfaces around internal data to provide getters but not setters. Similarly, if you want well-formed arguments to a complex function, you can use a builder pattern which helps the programmer auto-complete their way to having a well-formed input. If you want to avoid null pointer exceptions, you can leverage all the new annotations (@Nonnull, etc.). ErrorProne checks these statically.
-
How to build/support a library that you publish through MavenCentral or other such things. (Example oddity: the need for your packages to be named in relation to your email domain, so me needing to have "edu.rice.whatever" for my packages coming from Rice.)
-
How to deal with packing up and using a JRE as part of a standalone software distribution. OpenJDK versus other JDKs. Licensing issues. Keeping your stuff up to date.
-
How to deal with JNI, and why you should be deathly afraid of things going horribly awry, because you're allowing for unsafety to creep into your previously safe Java code.
-
Should you really be coding in Java, versus Kotlin or Scala or maybe Clojure? I'll argue that Kotlin is almost always the right answer, unless you've got a very specific need.
-
Should you use the GraalVM compiler? Why not? (Licensing issues?)
-
Should you use a JavaScript engine? Which one? V8? Rhino?
-
The sad history of the Java security manager, and why Java is no longer suitable for applets in your web page.
-
Why you should never, under any circumstances, use Java's built-in serialization -- because an attacker can create serialized objects that hijack you when you deserial them. If you need human-readable, use JSON. If you need binary and fast, use protobufs.
From first review by Dan
- [x] What are you doing about XSS, CSRF, SQL injection? Do you have a framework that addresses these concerns for you? Epic and spikes for frameworks for these? Answer: we won't do this
- [x] Related to CSRF: how are you authenticating your inbound connections? How do you internally track different users / entities in your system? Question for the aforementioned epic/spikes? Answer: we won't do this
- [ ] What about the rest of your web stack? If you're using a microservices library (e.g., Javalin), then what's below it (e.g., Jetty)? Dependency management topic -maybe add to further reading page
- [ ] Anywhere there's input from the outside world, how are you parsing it? What if somebody's trying to hack you through your parser? For example, if you're doing a NPM server, an easy mistake to make is parsing JSON as if it's just JavaScript and feeding it to eval. For binary file formats like JPEG, is your parser written in C / C++ or something safe like Java or Rust? Unsafe parsers for binary file formats are a common source of vulnerabilities. Needs mention, somewhat related to the first two items - spike to check this as secure using tooling?
- [x] As a general rule, where do you have C / C++ code in your system? If you're writing in Python, for example, many popular libraries have native code underneath for performance (e.g., numpy). Do an audit and decide if any of these present risks. Needs at least mention, probably not a lot of detail though. - not talking about
- [x] There's always going to be a dependency that turns out to have a security problem in it, that you won't/can't know about in advance. How will you learn about it? GitHub Dependabot is a good answer. Is it your answer? Similarly, for each dependency you might bring in, you should evaluate the author. How active is the repo? How responsive is the author to bugs? Dependency management topic
- [x] A related issue: are you dependent on a service that might go away? I used to pull in some dependencies from JCenter, but then it went away and those dependencies weren't hosted anywhere else. This means I literally can't do a new build of a project that I haven't touched for a few years. To even fix a minor bug, I'd have to do a bunch of reengineering to bring that project up to running with newer libraries. Runtime dependencies, maybe a brief mention but mostly out of scope since we are focused on build
- [x] There are all kinds of software scanning tools out there. I really like Google's ErrorProne for Java bugfinding, because you know it's been through the wringer of Google employees who emphatically don't like false positives. Another epic/spike candidate? card already created
- [x] Have you fuzzed your code, or your library dependencies? I'm a big fan of property-based testing, and there are several great libraries for Java that do it. (I used to use QuickTheories, but it hasn't been updated in years, so I'd probably look for something newer. https://github.com/quicktheories/QuickTheories) Using this, for example, I found that an Apache Commons library that I wanted to use for escaping and unescaping strings had bugs, because I was fuzzing for strings where decode(encode(x)) == x. QuickTheories found counterexamples! I then found unbescape (https://www.unbescape.org/) and all my tests passed. Another epic/spike candidate? Mutation testing = fuzzing
- [x] https://github.com/google/building-secure-and-reliable-systems While this could be a valuable resource, I am a bit hesitant to refer to it in the book without having read it