litmus
litmus copied to clipboard
Update ADOPTERS.md with Litmus usage details
The LitmusChaos Community is working towards increasing adoption of chaos engineering practices within the Kubernetes world & is focused on collaboration with other cloud-native projects. One of the ways of tracking the project's reach is via an ADOPTERS list. The purpose of this issue is to get a list of organizations/individuals who are using Litmus to power their chaosengineering practice and also share broadly their usecases & reasons for choosing Litmus.
Please comment on this issue with details like:
- Applications/Workloads or Infra that are being subjected to chaos by Litmus
- Why was Litmus chosen & how it is helping you (a brief description on the usecase)
- Are you using it as part of devtest, CI/CD, in staging/pre-prod/prod or other
- If you would like your name (as standalone user) or organization name to be added to the Adopters.md, please provide a preferred contact handle like github id, twitter id, linkedin id, website etc.
This information will be used to create a PR on the ADOPTERS.md file, which you can approve. Alternatively, feel free to create a PR and reference this issue !
- I am currently using LitmusChaos to demonstrate a POC for Chaos Engineering on Serverless Architecture.
- I shall be presenting this at the DevFest Siberia 2020.
- GitHub ID: divya-mohan0209, Twitter Handle: Divya_Mohan02
I am using LitmusChaos as a part of our QA cycle at the moment to verify resiliency and catch bugs. For now it is only used in AWS EKS and Ec2 instances , we are expanding it to usage in Azure hopefully soon. Litmus looked solid, easy to implement and most of all easy to customise. gitHub id xkbarkar, Netapp Inc
- k8 pods hosted on both aws and azure .
- Needed a clean way to introduce anomolies in the system to figure out its behaviour , litmus was the one that was clean and easy to use
- using it part of QA cycle
- Akridata
- Currently working on using Litmus for introducing Chaos in Kubernetes clusters.
- I was looking for a cloud-native way of introducing Chaos and after going through the details and other options, Litmus was probably the best fit.
- Usage of Litmus is still in preliminary stages. A limited set of chaos experiments are used for testing resiliency. This will change in the future.
- GitHub ID - ishantanu
Applications/Workloads or Infra that are being subjected to chaos by Litmus:
- Internal workload pods and storage resilience (OpenEBS); This is to test my built-in cluster resilience running whilst running on arm64 architecture and building confidence in the design and overall architecture.
Why was Litmus chosen & how it is helping you (a brief description on the usecase):
- I reviewed several chaos tools and felt that Litmus being associated with CNCF and being an open-source tool aligns with my own personal preference and values. It has a very active community and repository, and there was well-documented information that helped during the initial learning phases.
Are you using it as part of devtest, CI/CD, in staging/pre-prod/prod or other:
- I'm using it to run my RPi Kubernetes cluster which is my home cluster. This is running my personal production workloads.
If you would like your name (as standalone user) or organization name to be added to the Adopters.md, please provide a preferred contact handle like github id, twitter id, linkedin id, website etc:
- Github ORG https://github.com/raspbernetes & LinkedIn https://www.linkedin.com/in/michael-fornaro-5b756179/
Applications/Workloads or Infra that are being subjected to chaos by Litmus
- Kublr-provisioned Kubernetes clusters; we apply litmus chaos load to stress-test the clusters and identify the weak spots and components prone to failures under stress when customer applications stress the system
Why was Litmus chosen & how it is helping you (a brief description on the usecase)
- Litmus is well-documented, well-supported open source tool with a great community and development team. It is flexible and allows us to adjust the chaos tests any way we need.
Are you using it as part of devtest, CI/CD, in staging/pre-prod/prod or other
- This is currently used as a part of development testing and adhoc experiments, although we are working on including litmus chaos tests into our standard automated QA process
If you would like your name (as standalone user) or organization name to be added to the Adopters.md, please provide a preferred contact handle like github id, twitter id, linkedin id, website etc.
- Oleg Chunikhin, CTO, Kublr
- https://github.com/olegch
- https://twitter.com/olgch
- https://www.linkedin.com/in/olegch/
- Organization: Kublr, https://kublr.com
- https://github.com/kublr
- https://twitter.com/kublr
- https://www.linkedin.com/company/kublr/
Please add VMware as adopter. Will add more description later. Use case is Chaos Engineering in CD.
Why do we use Litmus. To ensure resilience, detect bugs and test rollouts. We are still in the early stages.
How do we use Litmus. Litmus is being used as part of dev/test cycles to catch bugs & verify resiliency.
Benefits in using Litmus. The litmus is easy to use and extend/develop based on custom requirements and well-supported open source tool.
Please consider the shared file here as adopter for Pravega to acknowledge usage of Litmus Chaos, thanks.
Pravega.md
Why do we use Litmus. To inject network related faults on kubernetes environment
How do we use Litmus. Litmus is being used as part of QE testing
Benefits in using Litmus. The litmus is easy to use and to inject faults in environment
We are using litmus chaos to inject faults in our aks environments. Before arriving at litmus we explored other tools , but found litmus to be the most well rounded one and the one that aligned closest to the principles of chaos We are using litmus in our pre prod environments in the ci cd stage as a gate for releases
The chaos gated deployments make use of the in-built git ops integration in litmus
https://www.neudesic.com/
We have used Litmus to build out Chaos Engineering platforms with some of our large E-Commerce customers to improve resilience for big sales periods such as Black Friday.
We looked into quite a few tools, and Litmus provided us with the flexibility we needed, whilst bootstrapping many of the components we would have to write ourselves.
We also used Litmus Chaos experiments when discussing some of our customer's architecture constraints, and showing them real world cases of how to make Kubernetes more resilient.
- One concrete use case was our customer wanting to build a cluster per app, whilst we wanted to build bigger clusters for easier management. We would use Litmus to show what application failure looks like on one part of the cluster, and show global resilience in their cluster when this happens.
The Litmus community and *product have been a great addition to our tool stack, and provided many benefits for us.
We have been using Litmus 2.X at iFood for a couple of months, replacing chaostoolkit as it provides a wider range of experiments out-of-the-box. We've started using it to validate the fallback mechanisms of critical services monthly. Right now, we are expanding its usage to go further and inject failures to drop access to databases, redis, Kafka and AWS services and learn from it and take some countermeasures to improve the critical services. I hope Litmus to become the de-facto tool to implement Chaos Engineering in a simple manner. Github: bbarin website: ifood.com.br
We at FIS Global, have been embarking on to larger SRE program to transform platform teams from purely operations focused to bring in SRE/Automation culture and mindset. As part of that larger effort, Chaos/Resiliency Engineering is identified as key program to improve stability and availability thus improve overall reliability of applications across organization and provide superior customer experience. We have chosen Litmus as a Chaos Engineering Tool because, It
- Fulfills all of resiliency testing requirements
- Has good and responsive community
- Has good documentation
- is built on loosely coupled architecture
- Has nice dashboard features
- Exposes APIs to integrate with CI/CD pipelines
Where we are using Litmus
- Currently, using in Applications/Workloads but idea is to expand to Infrastructure, e.g. using network latency to identify and understand resiliency of upstream application/component when downstream application/component is slow, Use Pod delete under production load to understand the application's ability to self heal.
- Simulate experiments using Litmus to understand utilization of JVM's key resources such as thread pool, connection pool, heap memory etc
- Kafka Resiliency : Kafka itself is a complex distributed architecture solution, planning to use Litmus network and memory hog experiments to simulate latency between Producer and Broker, Consumer and Broker, Leader and Follower, and also trying to understand how cluster behaves under Memory and CPU pressure.
- Integrate Litmus with CI/CD over APIs so that Chaos Testing can be autonomous
In adidas, we started months ago with a new initiative about how to implement chaos engineering practices in order to provide the engineering teams a guide and tools about how to test the resilience of the applications through chaos engineering. With this goal in mind, we started to define some best practices and processes to be shared with our engineering team, and we started to evaluate a few tools.
After an evaluation of different tools, we decided to go ahead with Litmus Chaos. How are we using Litmus chaos:
-
Applications/Workloads or Infra that are being subjected to chaos by Litmus
- Litmus chaos will be provided by our platform team as part of their services. It will be running on kubernetes and will be available for engineering teams.
- Experiments, like pod deletion, network latency or packetloss, applied between functional dependencies like checkout & Payments, login, order creation...
- Not applied in production yet.
-
Why was Litmus chosen & how it is helping you (a brief description of the usecase). We defined a set of priorities (with different value) and stoppers, we analyzed the tooling and selected the most valued one:
- Prio 1 & Stoppers if not: Full detailed documentation in English available, API / Shared Libraries, Control Injecting Failure, Permissions scope isolated, Authorization, chaos Scenarios - Parallel, works with: Kuberentes, OpenSource
- Prio 2: Installation and Management, Metrics / Reporting, Halt attack, Automatic rollback, High/admin permissions on the node, Chaos scenarios as code, chaos attacks - Serial, Custom or Specialized Attacks, Custom or Specialized Scenarios, Works with: AWS
- Prio 3: Access to the logs, Scheduling attacks, Health Checks, Application Attacks, Target Radomization, Network Attacks, VMs Attacks, Public API, Web UI
-
Are you using it as part of devtest, CI/CD, in staging/pre-prod/prod, or other
- Staging/pre-prod
- Planned to go to production and through CI/CD pipelines.
-
If you would like your name (as standalone user) or organization name to be added to the Adopters.md, please provide a preferred contact handle like GitHub id, Twitter id, LinkedIn id, website etc.
- company website: adidas.com
- company github: https://github.com/adidas
- personal Linkedin (SPOC): https://www.linkedin.com/in/victor-raton-arjona-5806b924/
We are utilizing Chaos Engineering for something else at the moment :) We found it very useful to bring our engineering confidence while responding to production incidents and train them on cloud native engineering practices, check out this article where I elaborate more on our workshop - https://www.infoq.com/articles/chaos-engineering-cloud-native/
After an evaluation period of some Chaos Engineering tools, we chose Litmus because it is a more mature tool that would meet most of our needs. We are in the implementation, configuration, and process definition phase. AB-Inbev's BEES is a huge project that has hundreds of microservices, it has been a great challenge to adapt Litmus in this process, making customizations and counting on the help of the Litmus community to evolve the tool and thus achieve our goal of making it available to the teams. Some points that made us choose Litmus:
- Based on K8S resources
- SSO
- Customization of attacks, attacks in parallel
- Installation on multiple clusters
- GitOps
At InfraCloud, we are using Litmus to develop Resiliency Frameworks. Why do we use Litmus. To simulate various Chaos scenarios using fault injection templates provided by Litmus. Litmus also helps to incorporate custom fault templates developed using AWS SSM documents.
How do we use Litmus. Currently, we have tested with different kind of scenarios including faults like pod deletion, network latency, resource stressing, network partitioning in databases, and many more.
Benefits in using Litmus.
- Easy deployment.
- Easy Fault injection.
- Custom Grading for experiments
- SSM integration helps to inject fault in both EKS and external AWS components.
Company website: https://www.infracloud.io/ Company GitHub: https://github.com/infracloudio
We practice chaos engineering using Litmus in the Apache APISIX Ingress.
Litmus also helped us find hidden bugs.
Project website: https://apisix.apache.org/ This is the text version of my online sharing content. https://dev.to/apisix/building-a-more-robust-apache-apisix-ingress-controller-with-litmus-chaos-3ldn
At Baobab Group, we use LitmusChaos to orchestrate chaos on Kubernetes to help developers and SREs find weaknesses in their application deployments.
We use it on QA and Preprod stages in order to see how the Workloads and AWS ressources behave in case of failure injection.
How do we use Litmus. We use it on our Kubernetes workloads like pod deletion or CPU hog and we plan to extend it on cloud services..
Benefits in using Litmus.
- GitOps friendly
- Integrate easily in cloud native environment.
- Easy Fault injection.
- Visualize chaos scenario
Company website: https://baobab.com/
User comment by IFS
Flipkart is an adoptor of Litmus Chaos. In addition to using the core features, we have also built a VM chaos platform leveraging Litmus. The details are covered in this talk we gave at Chaos Carnival 2024 - Building a Chaos Platform for Virtual Machines with OpenSource Tools
- Applications/Workloads or Infra that are being subjected to chaos by Litmus
- Stateless services running on our Kubernetes infrastructure
- VMs running stateful workloads ( Using the VM Chaos platform built on top of Lirmus )
- Why was Litmus chosen & how it is helping you (a brief description on the usecase)
- We did an exhaustive analysis of top opensource chaos tools based on various scenarios. Litmus chaos was a winner in terms of
- Good User experience and interface
- Stable and secure chaos infrastructure
- Detailed documentation and active opensource community
- Ease of modifying code ( We modified both backend and front end to suit our needs )
- Pre-built kubernetes native experiments
- Litmus helps us in testing out the failure scenarios - which helps in validating assumptions about failover capabilities of our infrastructure as well as validating run books and failure recovery
- We did an exhaustive analysis of top opensource chaos tools based on various scenarios. Litmus chaos was a winner in terms of
- Are you using it as part of devtest, CI/CD, in staging/pre-prod/prod or other
- Currently we are using this in pre-prod infrastructure.
- If you would like your name (as standalone user) or organization name to be added to the Adopters.md, please provide a preferred contact handle like github id, twitter id, linkedin id, website etc.
- www.flipkart.com
- https://github.com/Flipkart
Why do we use Litmus. We are using Litmus at Ericsson to perform resilience testing of our applications and to gain an understanding of how they perform in failure scenarios
How do we use Litmus. We are using Litmus in pre production CI testing phase
Benefits in using Litmus. Litmus is easy to use and provides a good level of functionality with the included fault scenarios, whilst the architecture allows for easily deploying custom faults if required. It provides the means to easily test scenarios that would otherwise be difficult to test
Within Delivery Hero, two of our entities, Hungerstation and PedidosYa, have been leveraging Litmus to enhance the resilience of their services. We use various faults offered by Litmus such Network Latency, Network Corruption etc. Using Litmus the teams have been able to test mechanisms such as circuit breaking, fallbacks, scaling behaviour, context timeouts etc. Building on this experience, we are currently developing an internal Chaos Engineering Platform, based on Litmus, as part of our Global Developer Platform initiative. This platform aims to standardize and elevate chaos engineering practices across all Delivery Hero verticals.
At Talend, we are using Litmus 2.x and Litmus 3.x within our pipeline and for weekly checks. Litmus was the solution we chose to help us on our journey with chaos engineering.
How do we use Litmus?
Litmus is deployed in our environment to validate our observability/security stack and to help promote our builds before they go live into production. We use it within a weekly job that utilizes Litmus as a chaos controller, along with a custom-built tool that collects results after injected experiments and sends them to Slack in report form for better resilience improvements in our observability/security stack.
We have also started using it to validate our SLIs/SLOs and their runbooks. Additionally, we use it in our Jenkins pipeline when we want to promote builds to production after QA tests and validate that the new version supports newly injected turbulences, etc.
Benefits of using Litmus?
Litmus is a straightforward framework that provides multiple experiments and is easy to use by developers. It allows for the creation of specific chaos workflows depending on their needs.