toc icon indicating copy to clipboard operation
toc copied to clipboard

TAG Operational Resilience Tech Lead Nomination

Open mrbobbytables opened this issue 7 months ago • 9 comments

Following the TAG Reboot Timeline, we are opening nominations for (3) Tech Leads for TAG Security and Compliance. If this interests you, please review the information on TAG governance and responsibilities in the TAG Governance doc and the draft charter for the TAG. Then, if you're still interested - please post your bio below and confirm your interest in running for Tech Lead.

Election timeline: May 5: Nominations open for new TAG Technical Leads May 19: TOC Vote opens for initial TAG Technical Leads (3 per TAG) June 2: TOC Vote closes for initial TAG Technical Leads June 2: Initial round of newly seated TAG Technical Leads announced July 7: Nominations close for new TAG Technical Leads July 7: TOC and TAG Chairs Vote opens for new TAG Technical Leads (TAG Chairs only vote for their TAG TLs) July 28: TOC and TAG Chairs Vote closes July 28: Newly seated Technical Leads announced

NOTE: Timeline is subject to change; check the TAG Reboot Timeline issue for the most up-to-date information.

Once the initial leads are seated, we'll work on refining the charters and really get things going. :)

Links: TAG Restructuring Presentation - Feb 4, 2025 TAG Reboot Timeline Issue TAG Governance Doc Draft Charter

mrbobbytables avatar May 05 '25 22:05 mrbobbytables

I'd like to propose my candidacy as TAG Operational Resilience Tech Lead. Here is my Bio:

Raffaele Spazzoli

https://www.linkedin.com/in/raffaelespazzoli/

Raffaele Spazzoli is a full-stack enterprise architect with 20+ years of experience. Raffaele started his career in Italy as a Java Architect, then gradually moved to Integration Architect and then Enterprise Architect. Later he moved to the United States to eventually become an OpenShift Architect for Red Hat consulting services, acquiring in the process, knowledge of the infrastructure side of IT. Currently Raffaele covers a consulting position of cross-portfolio Red Hat products with a focus on OpenShift. For most of his career Raffaele has worked with large financial institutions allowing him to acquire an understanding of the processes, security and compliance requirements of large enterprise customers. Around 2019 Raffaele became part of the CNCF TAG Storage and contributed to the Cloud Native Disaster Recovery whitepaper. Later he took the co-chair role for TAG Storage. Two major focus points for Raffaele are how to improve the developer experience by implementing internal development platforms (IDP) and more recently how to help organization run Virtual Machines in Kubernetes.

raffaelespazzoli avatar May 06 '25 19:05 raffaelespazzoli

I want to self-nominate to run for Tech Lead of TAG Operational Resilience.

I am Rafael “Rafa” Brito, with 33 years of experience, which I decompose into 8 years of UNIX System V, 23 years of running Linux in production, 15 years in Financial/High-Regulated and High-Frequency Trading (read Resiliency, Security and Low Latency with Throughput), 4 years of grid/HPC, and 9 years of Kubernetes.

I started my career in the 1990s in Rio de Janeiro and immigrated to New York in early 2001.

I was the Lead System Engineer and Architect at the New York Stock Exchange for twelve years. During those years, I led multiple projects, such as pioneering the adoption of Linux on the trading floor, helping the systems be brought up online after the fateful 9/11, implementing SELinux for the first time in a financial firm, and be one the first paying Puppet customers as I (along, of course, with the engineering team) puppeteered the entire migration of the NYSE datacenter from Brooklyn to New Jersey of 5,000 bare metal servers.

In 2013, I went to Citigroup. My team managed 80k physical cores for the entire bank, powering mission-critical applications running on the CitiGrid. During those years, I learned the true meaning of distributed computing.

I started my Kubernetes journey in 2016 as a bank's Global Engineer Lead of Containers. Leading a large team, I productized Kubernetes for a highly regulated environment and at scale. When I left the bank in 2019, we had over 300 applications running in production, both on-premises and off-premises.

My experience from the bank drove me to a desire to solve Kubernetes Day-2 operations, such as performance, resource management, and migrations. I joined VMware and developed a Kubernetes Migrator (Patent pending US20240202019A1) to move stateful workloads across different clouds and Kubernetes distributions. Believe it or not, during this project, I was 44 years old when I made my first code contribution to an open-source project (Project Velero).

In 2022, I joined a startup called StormForge, which uses Machine Learning to address Day-2 operations and manage Kubernetes resource, performance, and cost management at scale. At StormForge, I have developed controllers, advised dozens of companies, and assessed hundreds (if not thousands) of workloads.

Because I experienced firsthand the challenges in Kubernetes operations, I became obsessed with educating the end-users about the intricacies of Kubernetes, Kubelet, and the Linux Kernel.

Since 2023, I have given dozens of talks at KCDs, meetups, and podcasts about performance, cost management, auto-scaling, best practices, etc. I am currently the co-author of the 2nd edition of the book “Acing the CKA Exam,” to be released at the end of Q2 2025. I am also one of the co-authors of the book “Jornada Kubernetes,” in Brazilian Portuguese, to be released on July 16, 2025. I made code contributions to OpenCost, a CNCF project.

I have become the lead organizer of the CNCF Austin, Texas chapter, and I co-organized KCD Texas 2024/2025 and KCD Brazil 2025.

Over the years, I have been known to be an aggregator, a conciliator, and an advisor. Despite all these paragraphs bragging about myself, I am humble (actually writing all these is a bit weird, but I guess this is part of the process).

Lastly, I want to be the Tech Lead of TAG Operational Resilience to learn more with the community, and give my time and experience to open-source, which has helped me so much in my career.

brito-rafa avatar May 14 '25 04:05 brito-rafa

I am writing to formally nominate myself for TAG Operational Resilience Tech Lead role. With a strong background in technology, risk management, and resilience planning, I believe I can contribute meaningfully to strengthening our operational frameworks and driving proactive solutions.

Over the past 20 years, I have led key initiatives focused on improving system performance, scalability, capacity, reliability, incident response, chaos engineering and business continuity across complex technology landscapes across different business domains. Consistently recognized as competent individual, skilled at coordinating with cross-functional teams in a fast-paced, deadline driven environment to steer timely completion of project with budgetary constraints. Enabler with experience in leading 30+ engineers & architects as well as identifying and enabling right talent to setup enterprise SRE team.

Patent holder for Methods and systems for assessing the risk of a release to production environment (US 20240420054A1).

As an active speaker in various technology forums including CNCF Coimbatore, LitmusCON, SRE community organizer, my approach combines technical depth with a collaborative mindset, enabling cross-functional alignment during high-impact events. Recent contributions include:

  • Lead the SRE team for largest streaming provider in North America , reducing recurring incidents and improved reliability up to 99.99%.
  • Driving OpenTelemetry implementation for a telco SaaS provider to consolidate observability tool stack to have unified observability.
  • Contributor to LinkedIn Newsletter on Observability to connect dots from business observability to technical observability and vice versa for any signals for improved correlation.
  • Contributing to CNCF Litmus project to cover more out of the box chaos scenarios and automated fault tolerance validations.
  • Driving automated operational readiness checks, Reliability maturity assessment, Observability maturity assessment for organizations to continuously improve reliability focus.
  • actively applying FMEA driven chaos engineering practices to design and implement solutions with self CHOP (Configurable, healing, optimizing, protecting) capabilities.

I am passionate about advancing organizational maturity in operational resilience and fostering a culture of continuous improvement. Serving as a TAG Tech Lead would be a great opportunity to further that mission and support our broader goals.

Here is my social profile to checkout my contributions in resilience and reliability space, https://www.linkedin.com/in/kbsivacse/

Thank you for considering my nomination. I would be honored to contribute to TAG’s efforts and collaborate with peers in strengthening our resilience capabilities.

kbsivacse avatar May 14 '25 08:05 kbsivacse

I would like to nominate myself for the TAG Operational Resilience Tech Lead position.

I am Alex Jones, a Principal Engineer at AWS with a deep focus on operational resilience and SRE practices across cloud-native environments. My journey spans major financial institutions (JPMorgan, American Express) and technology companies (Microsoft, British Sky Broadcasting), where I've consistently worked to improve system reliability, observability, and operational excellence.

As the creator and maintainer of K8sGPT, I've demonstrated my commitment to improving operational resilience in the cloud-native ecosystem. K8sGPT embodies the principles of operational excellence by:

  • Automating issue detection and triage in Kubernetes environments
  • Codifying SRE knowledge into actionable insights
  • Reducing mean time to resolution (MTTR) through AI-powered analysis
  • Democratizing operational knowledge for teams of all sizes

My contributions to operational resilience in the cloud-native community include:

  1. Tools and Projects:

    • Creator of K8sGPT and K8sGPT Operator, focusing on automated system analysis and reliability improvement
    • One of the founders of OpenFeature (CNCF project), working on reliable feature flag management
    • Development of various cloud-native tools focused on observability and reliability
  2. Community Leadership:

    • Regular speaker at KubeCon and CloudNativeCon, sharing insights on operational resilience
    • Author of "SLO's don't matter: A nihilist's guide to reliability" - challenging conventional wisdom in SRE practices
    • Active contributor to discussions about the future of cloud-native operations
  3. Practical Experience:

    • Led the implementation of SRE practices across multiple enterprise environments
    • Extensive experience in building and operating reliable systems at scale
    • Focus on creating positive engineering cultures that enable reliability through shared accountability

My approach to operational resilience centers on:

  1. Building systems that are reliable by design
  2. Implementing effective observability and monitoring
  3. Automating toil reduction through intelligent tooling
  4. Creating inclusive communities around operational excellence
  5. Ensuring ethical considerations in AI-powered operations

I believe my combination of hands-on experience building reliable systems, active community involvement, and focus on innovative solutions positions me well to contribute to TAG Operational Resilience's mission. I'm particularly interested in helping shape how we approach reliability in an AI-augmented future while maintaining robust operational practices.

I've consistently advocated for and implemented practices that make operational resilience more accessible and maintainable, as evidenced by my talks like "Beyond the Clouds: Charting the course for AI in the CloudNative world" and my work on making complex operational problems more approachable through tooling and automation.

Links:

  • GitHub: https://github.com/AlexsJones
  • Speaking: https://sessionize.com/jonesax/
  • LinkedIn: https://www.linkedin.com/in/jonesax/

AlexsJones avatar May 14 '25 19:05 AlexsJones

Scott Mabe – CNCF TAG Operational Resilience Nomination

Hello, my name is Scott Mabe, and I’m currently a Technical Advocate at Datadog. I lead with compassion and understanding—teaching and elevating others brings me joy. This mindset has led me to volunteer with local tech organizations, including a Baltimore-based charity called Code in the Schools, which helps city youth learn computer science skills.

As an observability advocate, I enjoy helping others realize their goals. I focus on democratizing data access and building strong, collaborative teams. I'm passionate about helping others learn and grow—so much so that I have a tattoo that reads “Be excellent to each other.” I stay active in the tech community by speaking at events like DevOpsDays and volunteering at various conferences and meetups.

Prior to joining Datadog, I helped people learn cloud computing and open source software, both professionally and for fun. I've contributed to online communities such as the now-defunct Linux Academy, OG-AWS Slack, and others that have come and gone over the years. I enjoy helping people break into the tech industry—all while maintaining and supporting large-scale cloud computing efforts. Past work includes projects for the U.S. Federal Government, particularly the Department of Health and Human Services.

I live in Baltimore, Maryland, and bring a unique combination of technical expertise and cultural perspective. My somewhat unconventional background—including time spent as a radio DJ—continues to shape how I communicate with clarity and intent. Outside of work, I enjoy going to punk rock shows and learning something new every day.

I’m excited to apply my experience and passion to the TAG Operational Resilience Technical Lead role, contributing to the CNCF community.

You can connect with me on LinkedIn. Yes, anyone—because everyone is welcome.

Scott-Mabe avatar May 15 '25 20:05 Scott-Mabe

I submit herein my application to the TOC for consideration to serve as a Technical Lead for TAG Operational Resilience.

I discovered the CNCF in 2018, and joined the TOC as a contributor in 2020 (Add Matt Young as a TOC Contributor). At the time I was hands on and leading my team at a startup that had just gone public, and thru covid grew traffic 40x. My full technical qualifications and experience are available upon request. In short, we engaged the business's engineering teams with the CNCF and were able to build a fantastic platform that supported the business.

At the same time I drove the process to create TAG Observability and have served as co-chair since 2020. I've spoken at a number of KubeCon's and other conferences, have created, managed, and marketed our TAG Observability Expert Speaker Series, and think we should build on this success and create similar programs for all domains.

I've proposed a few Initiatives (summarized below) for the TOC's consideration and review, and think that they form the basis for a healthy Call for Contribution, and would attract skilled contributors from the CNCF's technical community, and would help the CNCF itself and provide benefit to it's projects and community of practitioners.

I want to contribute on both Technical matters and where possible to help shape and inform the CNCF's Processes and capabilities. I also have a lot of ideas on how we can build and strengthen our Communities, and facilitate their addressing the challenges and opportunities presented by CNCF project adopters and End Users.

On Policy Resistance

Image

Donella Meadows has (yet another) great quote on "Policy Resistance" in her book [Thinking in Systems](https://www.goodreads.com/book/show/3828902-thinking-in-systems https://donellameadows.org/)

Trap: When various actors try to pull a system state toward various goals, the result can be policy resistance. Any new policy, especially if it’s effective, just pulls the system state farther from the goals of other actors and produces additional resistance, with a result that no one likes, but that everyone expends considerable effort in maintaining.

The Way Out: Let go. Bring in all the actors and use the energy formerly expended on resistance to seek out mutually satisfactory ways for all goals to be realized--or redefinitions of larger and more important goals that everyone can pull toward together.

GitHub speed view


Initiative Timeline Key Value
Substation Project Evolution 6 months Cloud-agnostic event processing patterns
Software Supply Chain Insights 6-9 months Unified view of CNCF dependency relationships
Community Knowledge Graph 6-9 months Connected view of CNCF community activities
Open Source Development Pattern Analysis 9-12 months Data-driven insights into development workflows
Project Capabilities Badging Framework 3-6 months Standardized project capability signaling

1. Substation Project Evolution Plan for CNCF Sandbox

Core Value: Production-proven event processing patterns for cloud-native applications, evolved from AWS-specific to cloud-agnostic implementations using Crossplane XRDs.

Key Deliverables:

  • Cloud-agnostic documentation of event processing patterns
  • Crossplane XRD implementation with AWS reference
  • Local development environment for contributors
  • OpenTelemetry integration for observability
  • Three demonstrator applications showing real-world value

Impact: Provides foundational infrastructure for processing complex event streams in a cloud-native way, enabling the other initiatives and offering reusable patterns for CNCF projects and end users.

2. CNCF Software Supply Chain Insights

Core Value: Unified, queryable view of software supply chain relationships across all CNCF projects.

Key Deliverables:

  • Data pipelines for ingesting and normalizing supply chain metadata (SBOMs, VEX, attestations)
  • GUAC implementation hosted on CNCF infrastructure
  • Interactive visualization of transitive dependencies and vulnerabilities
  • Example queries for security posture, dependency tracking, and compliance insights

Impact: Enables understanding of collective dependency relationships and security posture across the CNCF ecosystem, providing actionable insights for vulnerability management and compliance.

3. CNCF Community Knowledge Graph: Projects and Activity

Core Value: Integrated view of CNCF community information connecting projects, people, content, and activities.

Key Deliverables:

  • ETL pipelines for community data sources (YouTube, blogs, GitHub activity)
  • Neo4j graph database implementation
  • Example queries revealing community structure and knowledge flows
  • Documentation of data sources and integration methods

Impact: Creates a foundation for understanding community trends, identifying subject matter experts, tracking technology adoption, and visualizing collaboration patterns across the ecosystem.

4. Large-Scale Open Source Development Pattern Analysis

Core Value: Data-driven understanding of development workflows and collaboration patterns across open source projects.

Key Deliverables:

  • Scalable GitHub Archive processing pipeline
  • Graph-based representation of PR/Issue interactions
  • Analysis of recurring development patterns in CNCF projects
  • Collaboration with academic researchers on methodology validation

Impact: Moves beyond simple metrics to identify non-obvious patterns of collaboration that can inform best practices and community health assessments for large projects.

5. CNCF Project Capabilities Badging Framework

Core Value: Standardized framework for communicating project capabilities and best practices adherence.

Key Deliverables:

  • Governance model and badge proposal process
  • Standard metadata format for badge representation
  • Display and discovery mechanisms for the CNCF Landscape
  • Example badge implementations in key technical areas

Impact: Makes project capabilities more discoverable to users and provides clear signals about project maturity in specific technical areas like observability, security, and documentation.


Together, these initiatives will enhance the CNCF's understanding of its projects, contributors, and ecosystem relationships, supporting more data-driven decision-making and community development.

  • https://github.com/cncf/toc/issues/1708
  • https://github.com/cncf/toc/issues/1709
  • https://github.com/cncf/toc/issues/1710
  • https://github.com/cncf/toc/issues/1711
  • https://github.com/cncf/toc/issues/1712

halcyondude avatar May 19 '25 16:05 halcyondude

I want to nominate myself for the TAG Operational Resilience Tech Lead position.

About Me

My name is Nabarun Pal and I work as a Principal Engineer at Broadcom working on the Kubernetes Distribution. I work on making Kubernetes and other CNCF projects (like Containerd, etcd, Calico, Antrea etc) to make a distribution performant at scale. Previously, I have worked on making kcp (another CNCF project) production-ready and working for hundreds of thousands of tenants. My work involves delivering reliable and sustainable Day 2 experiences to customers and providing them the ability to use the products at massive scale without compromising resiliency.

In the CNCF landscape, I serve as a co-chair for Kubernetes SIG Contributor Experience, serve as a [Release Manager for Kubernetes], maintain the Kubernetes GitHub organizations as a GitHub Admin and am a CNCF Ambassador since 2023. In the past, I served the Kubernetes release team for more than 9 releases, having been the Release Lead for Kubernetes 1.21, the Enhancements Team Lead for Kubernetes 1.19, the Branch Manager for 1.24, and the Emeritus Adviser for 1.26.

I have also served as an elected member of the Kubernetes Steering Committee from 2022 to 2024 and Code of Conduct Committee from 2021 to 2022. For my efforts to project, I was conferred with the contributor awards multiple times - 2021, 2022 and have also been featured in print media. To sustain the community, I have led multiple New Contributor Workshops and mentored several contributors and maintainers to grow in the CNCF ecosystem. I have also been part of multiple KubeCon+CloudNativeCon Program Committees and delivered a keynote at KubeCon India 2025.

Why me and What I plan to do?

  • Deeply vet and efficiently manage the lifecycle of subprojects under TAG Operational Resilience as well as subprojects which are in collaboration with others
  • Help TOC to evaluate inbound projects for the CNCF ecosystem and vet the existing project’s health status
  • Collaborate with the TOC and other TAGs to run initiatives
  • Liaise with the projects under the banner of the TAG to resolve any of their technical blockers
  • Mentor and grow contributors across the global landscape by means of sustainable and accessible pathways

Through my past technical and leadership experiences in projects under the CNCF umbrella, I have shown my expertise in the above areas.

Socials

GitHub: https://github.com/palnabarun LinkedIn: https://linkedin.com/in/palnabarun Credly: https://www.credly.com/users/palnabarun Talks: https://nabarun.dev/speaking

palnabarun avatar May 20 '25 02:05 palnabarun

Thanks everyone for putting your nomination forward :) With the nomination period now closed, we're going to temporarily lock the issues just so it's clear that the nomination period is over. We'll reopen soon with updates. 👍

mrbobbytables avatar May 20 '25 13:05 mrbobbytables

Announcing the Tech Leads for TAG Operational Resilience:

Raffaele Spazzoli @raffaelespazzoli - 1 year term Matt Young @halcyondude - 2 year term Nabarun Pal @palnabarun - 2 year term

Thank you all for volunteering.

We encourage continued participation from everyone. There's plenty to do and many opportunities to come.

riaankleinhans avatar Jun 11 '25 19:06 riaankleinhans

Hi, I’m Iris Dyrmishi, and I’d love to throw my name in the hat for the TAG Operational Resilience Tech Lead role.

About me: I’m a Senior Observability Engineer at Miro and a CNCF Ambassador based in Portugal. I’ve been working in tech for just over six years now, always focused on Observability and SRE. Day to day, I work on maintaining and evolving our observability platform to make sure it’s modern, reliable, scalable, and easy to use. I focus on making observability feel out-of-the-box for engineers, helping teams get the right signals with minimal friction and take clear ownership of their telemetry data. A big part of my work is solving real reliability challenges and making sure observability is deeply integrated into how we build and operate systems, not only technically but also culturally. One of the things I’m most passionate about is OpenTelemetry. I’m a strong advocate for it in both my day job and the community. I’ve worked closely on scaling it internally, and I actively share those learnings through talks, panels, and mentoring. I care deeply about making OTel accessible for teams trying to get started, and I always try to represent the end-user voice in the conversation. Some recent examples of that advocacy and community work include:

​​Im also an active part of the CNCF ecosystem, where I focus on building community.

  • CNCF Ambassador – As part of the global CNCF Ambassador program, I advocate for open source, community-driven innovation and help grow awareness and adoption of cloud native technologies through talks, content, and community building.

  • Co-organizer, with the role of Program Chair for KCD Porto, I am leading the content for KCD Porto for the second year, shaping the agenda, driving the CFP process, and supporting speakers to deliver their best on stage. I focus on building a diverse, inclusive program grounded in real-world cloud native stories, while helping grow and strengthen the local community.

  • Program Committee Member for CNCF Events: I’ve contributed to several CNCF events as a program committee member, including but not only Observability Day EU 2025 and OpenTelemetry Community Day North America, KubeCon NA for the Observability Track.

  • Co-founder of Cloud Native Porto, a small local community growing cloud native awareness locally

What I’d bring to this role:

  • A strong and vocal end-user perspective, especially from the observability and platform engineering lens

  • Strong technical expertise in everything observability.

  • A community builder, driven by a passion for mentorship, collaboration, and growing communities.

The CNCF community has been a huge part of my own growth, and I’d love the opportunity to help shape how we talk about and build for operational resilience together. Thank you for considering me!You can find me on LinkedIn, and if we haven’t already met, I hope we will soon. —Iris

IrisDyr avatar Jul 07 '25 16:07 IrisDyr

Hello, I'm Chris Larsen and I'm volunteering as a tech lead focusing on the observability domain.

I have worked in the observability space for 15+ years, starting with monitoring Internet radio during the infancy of streaming media. This lead to working at Limelight Networks, monitoring a global CDN with static and streaming assets. We out grew our RRDTool based metrics system so after looking around, I settled on contributing to OpenTSDB, an open source project on HBase based on a metrics system at Google prior to the release of Borgmon and Prometheus. After writing and releasing version 2, I joined Yahoo to continue maintaining OpenTSDB (rewriting a version 3 with the start of a query language) and supporting various observability systems. Most recently, I moved to the observability telemetry team at Netflix, working across all of the telemetry types to correlate data and uncover insights for debugging and incident remediation. Additionally, I've presented at a number of conferences, from HBaseCon to Kubecon, and participate in local (SF Bay Area) Observability meetups with large tech companies.

Working in Observability is fascinating due to the complex issues of collecting, storing and analyzing insane amounts of structured and unstructured data while balancing costs with useful insights. Lately, I've been focusing on tying together the various types of telemetry (metrics, logs, traces, events, profiles, etc.) with each other and business data. To that end @vjsamuel and I started the ### Query Standardization Working Group under the previous CNCF Observability TAG. I would like to continue this work as a tech lead in the OpR group, as well as continue helping support the CNCF observability community (such as reviewing the Perses sandbox application).

manolama avatar Jul 11 '25 18:07 manolama

Hi! My name is Carol Valencia, I'd like to propose my candidacy as TAG Operational Resilience Tech Lead.

my professional experience, Me-Linkedin

It's over 10 years of experience in cloud technologies, including: • 4 years as a Site Reliability Engineer, focused on Kubernetes cluster operations, observability, and resource/cost optimization. • 3 years as a Cloud Security Architect, designing secure, scalable Kubernetes architectures. • The past 2 years have been dedicated to OpenTelemetry adoption and instrumentation, with a strong focus on Day 2 operations and performance optimization.

my Journey in the CNCF Ecosystem, Me-Github

Currently:
  • Maintainer for Spanish localization in key CNCF projects: Kubernetes (since 2020), CNCF Glossary (2021), and OpenTelemetry (2024) - Me LFX Profile
  • Lead of the self-assessment subproject, part of the kubernetes sig-security initiatives (2025).
  • Collaborate with the Linux Foundation on certification exam development and question review.
  • CNCF Ambassador since 2023.
  • KubeCon + CloudNativeCon program committees - Me credly
  • Organizer of the CNCF São Paulo, Brazil chapter (2023), and active supporter of the Lima, Peru chapter.
Past Contributions:
  • Served as shadow and lead in Docs and Comms on the Kubernetes Release Team.
  • Lead the Spanish translation of the CNCF Cloud Native Security Whitepaper.
  • Keynote speaker and co-chair at KubeDay Colombia 2024.
  • Keynote speaker at Panel KubeCon Chicago NA 2023.
  • KCD São Paulo, Brasil (2024, 2025) and KCD Lima (2024, 2025).

Since 2020, I've given talks about kubernetes, container security and observability. I've spoken at a number of KubeCon's and other conferences. - Me sessionize

My Interest in Joining the TAG

I believe in the power of open source. As an OSS contributor and active community organizer, I’ve learned the importance of leading by listening, amplifying diverse voices, and fostering meaningful collaboration across technical and regional boundaries. Supporting adopters and end users across the cloud native ecosystem, together with innovators, maintainers, and passionate contributors. It will be a lovely journey in this role.

Thank you for taking the time to read my proposal and learn a bit about me. 😊

krol3 avatar Jul 14 '25 04:07 krol3

Dear TOC,

Please accept my application for a Technical Lead role within the newly formed CNCF TAG for Operational Resilience. My experience, marked by over a decade of leadership in large-scale, cloud-native service architecture and platform engineering, particularly for search and observability services at AWS and currently at Apple, aligns perfectly with the demands of this critical role. At Apple, I lead the AIML observability platform, managing key services that operate at immense scale.

My deep commitment to open source spans from the early days of my professional career. Recently, I co-founded the CNCF End-User Technical Advisory Board (TAB) in 2024 to ensure end-user open collaborations and perspectives are central to the projects of the CNCF. Currently, I serve on the End-User TAB representing Apple.

As a technical leader within the CNCF since 2019, I've been extensively involved in numerous observability projects. My contributions include serving as a maintainer and Governance Committee member for OpenTelemetry (OTel) since 2022. Within OTel, I've spearheaded key initiatives such as the AI semantic conventions SIG, the Prometheus-OTel interoperability SIG, and the client-side RUM effort. I've also helped significantly enhanced metrics support in the OTel Collector, improved OTLP-Prometheus interoperability (including Prometheus remote write support), and promoted scaling improvements in the OpenTelemetry operator. Furthermore, I've provided technical stewardship to AWS teams and currently advise Apple's teams on their contributions to CNCF observability projects like OpenTelemetry, Cortex, Prometheus, Thanos, Jaeger, and the K8s instrumentation SIG.

From 2021 to 2025, I co-chaired the Observability TAG, driving significant initiatives including the creation of the Observability Query Language Specification (QLS) workgroup and establishing the TAG Expert Speaker Series. I also substantially contributed to the TOC's review of critical observability projects such as SubStation, Perses, Inspektor Gadget, and Pixie. My expertise in observability, search, and building highly scalable services has been further shared through extensive presentations at international industry and cloud-native open source conferences.

I am excited by the opportunity to leverage my proven technical leadership, extensive industry experience, and deep subject matter expertise in observability to guide key initiatives and drive advancements across the entire operational resilience domain within the TAG. I am confident I can make an immediate and significant impact.

Thank you for your consideration. Alolita Sharma

Reference Links: LinkedIn GitHub Keynotes and Talks

alolita avatar Jul 14 '25 15:07 alolita

Thanks everyone for putting your nomination forward. With the nomination period now closed, we're going to temporarily lock the issues just so it's clear that the nomination period is over. We'll reopen soon with the election results.

riaankleinhans avatar Jul 14 '25 15:07 riaankleinhans

Announcing the Tech Leads for TAG Operational Resilience:

Carol Valencia @krol3 - 2 year term Alolita Sharma @alolita - 1 year term

Thank you all for volunteering.

We encourage continued participation from everyone. There's plenty to do and many opportunities to come.

riaankleinhans avatar Jul 22 '25 16:07 riaankleinhans