wg-chaoseng
wg-chaoseng copied to clipboard
Draft chaos engineering definition/whitepaper
Keen to help with that !
Happy to support the effort too.
Me to :)
Ping
the best bet is currently to contribute to the proposal here which is sketching out a bit of an outline of what can become a whitepaper/landscape:
https://docs.google.com/document/d/1BeeJZIyReCFNLJQrZjwA4KMlUJelxFFEv3IwED16lHE/edit?ts=5ace0eab#heading=h.k8f5ndt8affu
Here are my ideas for a draft outline, would love feedback since I'm new to this space still:
- What is chaos engineering?
- A history of chaos engineering
- Chaos Engineering Use Cases
- Planning Experiments
- Chaos Engineering in Cloud Native Systems
- Chaos Culture: Planning Chaos/GameDays
- Conclusion
ping
@caniszczyk That document is likely getting hard to navigate, and make sense of. I'm happy to move it to this repo so we can start using GH issues instead.
While GH is not a document-collaboration tool, I guess, should we clearly mark each section in the proposal, we could simply refer to each section from GH issues for discussions.
+1 to moving to GitHub
On Mon, 21 May 2018, 21:47 Sylvain Hellegouarch, [email protected] wrote:
@caniszczyk https://github.com/caniszczyk That document is likely getting hard to navigate, and make sense of. I'm happy to move it to this repo so we can start using GH issues instead.
While GH is not a document-collaboration tool, I guess, should we clearly mark each section in the proposal, we could simply refer to each section from GH issues for discussions.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/chaoseng/wg-chaoseng/issues/3#issuecomment-390777617, or mute the thread https://github.com/notifications/unsubscribe-auth/AAdUOqZhtB29AnwHH1k71IQ2VZFZsqQAks5t0yfngaJpZM4TnmJy .
-- Mikolaj Pawlikowski
Regarding the outline @caniszczyk, it's a good starting point. I might add a section regardng chaos engineering in relation to other disciplines/practices: security, CI/CD... basically, where does CE fit in the toolchain? But, maybe, this is covered by the "CE in Cloud Native Systems"?
I agreee with @Lawouach and @seeker89, the Google doc got crowded fast :)
We could just do a bit of Markdown on individual sections and then generate something, e.g. a PDF, when needed.
on the suggestion from everyone, I converted what we had in the gdoc to here:
https://github.com/chaoseng/wg-chaoseng/blob/master/WHITEPAPER.md
It needs a lot of work but now we can start iterating via pull requests.
cc: @chaoseng/maintainers
@caniszczyk +1
Hey all,
Here is a strawman of structure for the whitepaper. Hopefully will help the discussion :)
Chaos Engineering Whitepaper v0.1
What is Chaos Engineering?
Short History
Principles
Objective: Harness and Improve System Resilience
Benefits for Cloud Native Systems
Relation to Existing Software and Operational Practices
Use Cases
Practicing Chaos Engineering
Chaos Engineering Flow
Define a Baseline
State the Hypothesis to Confirm/Infirm
Determine a Perturbation to Perform
Chaos Engineering Perturbations
Degrade Network Conditions
Vary Computing Resources
Stress to the Limits
Simulate Data Loss
Change ACLs Permissions
Provoke a Security Breach
Chaos Engineering Automation
Continous Chaos Engineering
Chaos Engineering Reporting
Report Findings
Hi @Lawouach
Thank you for taking to the time to organize things a bit. Where does the landscape fit in this structure ? Can it be put in another document?
Hey @veggiemonk. Thanks, it looks like nothing when I look at it now but finding the right phrasing took me half a day the other day. Formalizing is hard :D
It depends on how we organize the whitepaper, either we list a bunch of examples for each section (so for instance on "Degrade Network Conditions", we could indicate Gremlin, Pumba, Muxy...) so that there is locality between the topic and potential vendors.
Or we continue with a long list of vendors at the bottom of the paper.
Hi @Lawouach, I totally understand that's hard work! 🙏
For now, the landscape doesn't need to be too formal because the list isn't that long actually. As a suggestion, let's keep it it at the end.
What do you think?
I don't know if the white paper is the right place for that but what about renaming the section "Chaos Engineering Flow" to "How to start Chaos Engineering". As a first step, we could add "setup monitoring" As a second step, we could "Warn users/developers about it" ?
It seems pretty basic but without that it can be hard/dangerous to do CE. Maybe it is too simple for this paper.
What are your views on that?
Interesting, I like the guidelines approach indeed.
There is certainly room for a section around the theory, as per the principles. But a "how to get started" one would be very welcome indeed!
How to get started + Links to product landscape and getting started points there would be awesome
Ok let's see what kind of resources we can gather in there.
A section of case studies and papers around the field was something we discussed in the last meeting also. Maybe as a very final section on 'Further Reading' ?
@Lawouach thank you so much for getting this started!
What do people think about starting a branch with @Lawouach's structure as a README we can start opening PRs against with sections filled in, a merged PR is an approval and we can go deeper on specific content for each section, then link to each PR in this issue?
I think I will refine taking comments that were made. Give me a moment :)
Chaos Engineering Whitepaper v0.1
What is Chaos Engineering?
Short History
Principles
Discuss the steady state, experiment, etc. Just to set the "theory"?
Why practicing Chaos Engineering?
Harness and Improve System Resilience
If Chaos Engineering isn't the goal per-se, what is? Resiliency? Reliability?
Benefits for Cloud Native Systems
Software and Operational Practices In Production
A clear indication that whereas testing, CI/CD are mostly upstream practices, Chaos Engineering is very much downstream and act against a live system. would that make sense?
Use Cases
The current use-cases are a good starting point but should we detail them? Similar to the depth we can find in the serverless whitepaper?
Practicing Chaos Engineering
Getting Started With Chaos Engineering
Is my system ready to endure Chaos Engineering?
Should we hint at what minimal level you need to be before getting started? I mean, what if your system is barely resilient as it is?
Do I need to get started in production?
While we may want this, starting in prod may not fit "getting started scenarios".
Communicate with the Organization
This is where we need to continue the discussion and figure out how far we want/can go with the patterns.
Should we talk gamedays for instance? Observability?
The following phases may or may not be useful. I think it would be valuable if we could describe what it means to deal with chaos in those various cases, but is it the right place?
Chaos Engineering Perturbations
Degrade Network Conditions
Vary Computing Resources
Stress to the Limits
Simulate Data Loss
Change ACLs Permissions
Provoke a Security Breach
Assume application fails to restart
Chaos Engineering Automation
Continous Chaos Engineering
Chaos Engineering Reporting
Report Findings
Landscape
That looks good! Thanks @Lawouach for the hard work!
I think a PR is in order for us to move forward.
@chaoseng/maintainers (CC @caniszczyk) so just out of curiosity what is the plan on iterating on this document now? I had a few minutes this afternoon and wanted to add some of my thoughts here, but it's a bit difficult to know where to start.
I'm happy to just take some time, make some edits and submit a PR for consideration, but didn't want to ruffle any feathers or step on any toes. Would it be beneficial to assign topics to individuals to comment on? Just thinking out loud here.
Hey @mattforni, I'd say it's totally fine to offer PRs to the document?
On my side, I used this issue as it felt more rapid to get started but I wonder if that would scale for a whole document indeed :D
PRs are the way to move forward! ⏩
PRs please :)
On Thu, Jun 28, 2018 at 8:54 AM, Julien Bisconti [email protected] wrote:
PRs are the way to move forward! ⏩
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chaoseng/wg-chaoseng/issues/3#issuecomment-401042894, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD5IUlInxWj6BOU6vOOWFOqMM63-Cf3ks5uBOAHgaJpZM4TnmJy .
-- Cheers,
Chris Aniszczyk http://aniszczyk.org +1 512 961 6719
Started on my trail of thoughts https://github.com/chaoseng/wg-chaoseng/pull/41