SONiC
SONiC copied to clipboard
[doc] Monitoring and Auto-mitigating the unhealthy of docker containers in SONiC
This document will introduce the motivation and design for monitoring, auto-mitigating the unhealthy of docker containers in SONiC.
What's the plan for this feature? Is it proceeding?
@ben-gale: Yes, it is proceeding. Most of the infrastructure is already in place in the master and 201911 branches.
@ben-gale: Yes, it is proceeding. Most of the infrastructure is already in place in the master and 201911 branches.
Thanks Joe - timeline for the code PRs to master?
@yozhao101: Can you please add a comment here with links to all the related PRs in sonic-buildimage and sonic-utilities thus far?
@yozhao101: Can you please add a comment here with links to all the related PRs in sonic-buildimage and sonic-utilities thus far?
Yes, I will update with link of PRs.
@ben-gale: Yes, it is proceeding. Most of the infrastructure is already in place in the master and 201911 branches.
Thanks Joe - timeline for the code PRs to master? @jleveque
This document introduced three features which we plan to deploy into SONiC:
1.We proposed to employ Monit to monitor the running status of critical processes in docker containers. The PRs of this proposal in the public SONiC repo are as following:
https://github.com/Azure/sonic-buildimage/pull/3940 https://github.com/Azure/sonic-buildimage/pull/4033 https://github.com/Azure/sonic-buildimage/pull/4706
2.We proposed to employ process monitoring/notification framework of supervisord to implement the auto-restart feature of docker containers. The PRs of this proposal in the public SONiC repo are as following:
[process monitoring/notification framework] https://github.com/Azure/sonic-buildimage/pull/2852/files [process monitoring/notification framework] https://github.com/Azure/sonic-buildimage/pull/4073
[Syncd] https://github.com/Azure/sonic-buildimage/pull/3534/files [SWSS] https://github.com/Azure/sonic-buildimage/pull/2852/files https://github.com/Azure/sonic-buildimage/pull/2845/files [SNMP] https://github.com/Azure/sonic-buildimage/pull/3650 [DHCP_Relay] https://github.com/Azure/sonic-buildimage/pull/3667 [Radv] https://github.com/Azure/sonic-buildimage/pull/3681 [PMon] https://github.com/Azure/sonic-buildimage/pull/3689 [Teamd] https://github.com/Azure/sonic-buildimage/pull/3703 [LLDP] https://github.com/Azure/sonic-buildimage/pull/3713 [Sflow] https://github.com/Azure/sonic-buildimage/pull/3751 [Telemetry] https://github.com/Azure/sonic-buildimage/pull/3768 [Database] https://github.com/Azure/sonic-buildimage/pull/4138 [BGP] https://github.com/Azure/sonic-buildimage/pull/4207 [NAT] https://github.com/Azure/sonic-buildimage/pull/4208
[CLI to check the state of autorestart feature of each container] https://github.com/Azure/sonic-utilities/pull/798 https://github.com/Azure/sonic-utilities/pull/801
Thx
On Thu, Jun 25, 2020 at 5:43 PM yozhao101 [email protected] wrote:
This document introduced three features which we plan to deploy into SONiC:
1.We proposed to employ Monit to monitor the running status of critical processes in docker containers. The PRs of this proposal in the public SONiC repo are as following:
Azure/sonic-buildimage#3940 https://github.com/Azure/sonic-buildimage/pull/3940 Azure/sonic-buildimage#4033 https://github.com/Azure/sonic-buildimage/pull/4033 Azure/sonic-buildimage#4706 https://github.com/Azure/sonic-buildimage/pull/4706
2.We proposed to employ process monitoring/notification framework of supervisord to implement the auto-restart feature of docker containers. The PRs of this proposal in the public SONiC repo are as following:
[process monitoring/notification framework] https://github.com/Azure/sonic-buildimage/pull/2852/files [process monitoring/notification framework] Azure/sonic-buildimage#4073 https://github.com/Azure/sonic-buildimage/pull/4073
[Syncd] https://github.com/Azure/sonic-buildimage/pull/3534/files [SWSS] https://github.com/Azure/sonic-buildimage/pull/2852/files https://github.com/Azure/sonic-buildimage/pull/2845/files [SNMP] Azure/sonic-buildimage#3650 https://github.com/Azure/sonic-buildimage/pull/3650 [DHCP_Relay] Azure/sonic-buildimage#3667 https://github.com/Azure/sonic-buildimage/pull/3667 [Radv] Azure/sonic-buildimage#3681 https://github.com/Azure/sonic-buildimage/pull/3681 [PMon] Azure/sonic-buildimage#3689 https://github.com/Azure/sonic-buildimage/pull/3689 [Teamd] Azure/sonic-buildimage#3703 https://github.com/Azure/sonic-buildimage/pull/3703 [LLDP] Azure/sonic-buildimage#3713 https://github.com/Azure/sonic-buildimage/pull/3713 [Sflow] Azure/sonic-buildimage#3751 https://github.com/Azure/sonic-buildimage/pull/3751 [Telemetry] Azure/sonic-buildimage#3768 https://github.com/Azure/sonic-buildimage/pull/3768 [Database] Azure/sonic-buildimage#4138 https://github.com/Azure/sonic-buildimage/pull/4138 [BGP] Azure/sonic-buildimage#4207 https://github.com/Azure/sonic-buildimage/pull/4207 [NAT] Azure/sonic-buildimage#4208 https://github.com/Azure/sonic-buildimage/pull/4208
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/SONiC/pull/564#issuecomment-649832838, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPRCW7PMP4GCLLCB75P5IDRYPAIVANCNFSM4KY2IHCA .