ozone
ozone copied to clipboard
HDDS-10749. Shutdown datanode when RatisServer is down
What changes were proposed in this pull request?
Currently, when RatisServer is down(mainly due to long GC which exceeds the ratis close threshold), Datanode is still running and in HEALTHY and IN_SERVICE state, which is confusing.
This tasks will shutdown the Datanode after RatisServer is down.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10749
How was this patch tested?
Manual test
A normal DN shutdown log, first XceiverServerRatis is stopped, "Stopping XceiverServerRatis 01effdc6-dad1-4bf3-916a-749d9aa7e5e5", then ContainerStateMachine is stopped, "Stopping ContainerStateMachine for group-5EA60976374E".
2024-04-24 17:53:21,589 ERROR ozone.HddsDatanodeService (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 2: SIGINT
2024-04-24 17:53:21,590 INFO ozone.HddsDatanodeService (StringUtils.java:lambda$startupShutdownMessage$0(144)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down HddsDatanodeService at SAMMICHEN-MB0/0.0.0.0
************************************************************/
2024-04-24 17:53:21,595 INFO ozoneimpl.OzoneContainer (OzoneContainer.java:stop(482)) - Attempting to stop container services.
2024-04-24 17:53:21,595 WARN ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:handleRemainingSleep(134)) - Background container scan was interrupted.
2024-04-24 17:53:21,595 INFO ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:run(61)) - Thread[ContainerMetadataScanner,5,main] exiting.
2024-04-24 17:53:21,595 INFO ozoneimpl.BackgroundContainerDataScanner (BackgroundContainerDataScanner.java:shutdown(141)) - ContainerDataScanner(/tmp/datanode1/storage/hdds) is shutting down.
2024-04-24 17:53:21,595 WARN ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:handleRemainingSleep(134)) - Background container scan was interrupted.
2024-04-24 17:53:21,596 INFO ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:run(61)) - ContainerDataScanner(/tmp/datanode1/storage/hdds, DS-af727dc0-66f9-4db9-8f1f-8ce487a40766) exiting.
2024-04-24 17:53:21,596 INFO ozoneimpl.OnDemandContainerDataScanner (OnDemandContainerDataScanner.java:shutdownScanner(206)) - On-demand container scanner is shutting down.
2024-04-24 17:53:21,606 INFO ratis.XceiverServerRatis (XceiverServerRatis.java:stop(604)) - Stopping XceiverServerRatis 01effdc6-dad1-4bf3-916a-749d9aa7e5e5
2024-04-24 17:53:21,606 INFO server.RaftServer (RaftServerProxy.java:lambda$close$9(416)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: close
2024-04-24 17:53:21,607 INFO server.RaftServer$Division (RaftServerImpl.java:lambda$close$3(526)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E: shutdown
2024-04-24 17:53:21,607 INFO server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcClientProtocolService now
2024-04-24 17:53:21,607 INFO util.JmxRegister (JmxRegister.java:unregister(73)) - Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-5EA60976374E,id=01effdc6-dad1-4bf3-916a-749d9aa7e5e5
2024-04-24 17:53:21,607 INFO impl.RoleInfo (RoleInfo.java:shutdownLeaderState(94)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-LeaderStateImpl
2024-04-24 17:53:21,610 INFO server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcClientProtocolService successfully
2024-04-24 17:53:21,610 INFO server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server GrpcServerProtocolService now
2024-04-24 17:53:21,611 INFO server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server GrpcServerProtocolService successfully
2024-04-24 17:53:21,611 INFO server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcAdminProtocolService now
2024-04-24 17:53:21,614 INFO server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcAdminProtocolService successfully
2024-04-24 17:53:21,614 INFO impl.PendingRequests (PendingRequests.java:sendNotLeaderResponses(289)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-PendingRequests: sendNotLeaderResponses
2024-04-24 17:53:21,620 INFO impl.StateMachineUpdater (StateMachineUpdater.java:stopAndJoin(157)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: set stopIndex = 2
2024-04-24 17:53:21,620 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(359)) - group-5EA60976374E: Taking a snapshot at:(t:2, i:2) file /tmp/datanode1/ratis/e9e7ba3c-7686-4b3a-96fd-5ea60976374e/sm/snapshot.2_2
2024-04-24 17:53:21,621 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(370)) - group-5EA60976374E: Finished taking a snapshot at:(t:2, i:2) file:/tmp/datanode1/ratis/e9e7ba3c-7686-4b3a-96fd-5ea60976374e/sm/snapshot.2_2 took: 1 ms
2024-04-24 17:53:21,622 INFO impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(295)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: Took a snapshot at index 2
2024-04-24 17:53:21,622 INFO impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(98)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: snapshotIndex: updateIncreasingly 0 -> 2
2024-04-24 17:53:21,623 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:close(1150)) - Stopping ContainerStateMachine for group-5EA60976374E.
2024-04-24 17:53:21,623 INFO server.RaftServer$Division (ServerState.java:close(427)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E: applyIndex: 2
2024-04-24 17:53:21,623 INFO util.AwaitToRun (AwaitToRun.java:run(49)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-cacheEviction-AwaitToRun-AwaitForSignal is interrupted
2024-04-24 17:53:21,695 INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:close(245)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-SegmentedRaftLogWorker close()
2024-04-24 17:53:21,697 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(152)) - JvmPauseMonitor-01effdc6-dad1-4bf3-916a-749d9aa7e5e5: Stopped
2024-04-24 17:53:23,783 INFO volume.HddsVolume (HddsVolume.java:closeDbStore(470)) - SchemaV3 db is stopped at /tmp/datanode1/storage/hdds/CID-9ba4109c-68b1-4311-9623-42f82149fb80/DS-af727dc0-66f9-4db9-8f1f-8ce487a40766/container.db for volume DS-af727dc0-66f9-4db9-8f1f-8ce487a40766
2024-04-24 17:53:23,783 INFO utils.BackgroundService (BackgroundService.java:shutdown(160)) - Shutting down service BlockDeletingService
2024-04-24 17:53:23,784 INFO utils.BackgroundService (BackgroundService.java:shutdown(160)) - Shutting down service StaleRecoveringContainerScrubbingService
2024-04-24 17:53:23,785 INFO statemachine.DatanodeStateMachine (DatanodeStateMachine.java:stopDaemon(640)) - Ozone container server stopped.
2024-04-24 17:53:23,790 INFO handler.ContextHandler (ContextHandler.java:doStop(1159)) - Stopped o.e.j.w.WebAppContext@3baf6936{hddsDatanode,/,null,STOPPED}{file:/Users/sammi/workspace/hadoop-ozone/hadoop-hdds/container-service/target/classes/webapps/hddsDatanode}
2024-04-24 17:53:23,794 INFO server.AbstractConnector (AbstractConnector.java:doStop(383)) - Stopped ServerConnector@4f453e63{HTTP/1.1, (http/1.1)}{SAMMICHEN-MB0:9882}
2024-04-24 17:53:23,794 INFO server.session (HouseKeeper.java:stopScavenging(149)) - node0 Stopped scavenging
2024-04-24 17:53:23,794 INFO handler.ContextHandler (ContextHandler.java:doStop(1159)) - Stopped o.e.j.s.ServletContextHandler@1816e24a{static,/static,file:///Users/sammi/workspace/hadoop-ozone/hadoop-hdds/container-service/target/classes/webapps/static,STOPPED}
2024-04-24 17:53:23,795 INFO ozone.HddsDatanodeClientProtocolServer (HddsDatanodeClientProtocolServer.java:stop(83)) - Stopping the RPC server for Client Protocol
2024-04-24 17:53:23,795 INFO ipc.Server (Server.java:stop(3523)) - Stopping server on 19864
2024-04-24 17:53:23,796 INFO ipc.Server (Server.java:run(1434)) - Stopping IPC Server listener on 19864
2024-04-24 17:53:23,796 INFO ipc.Server (Server.java:run(1567)) - Stopping IPC Server Responder
A DN shutdown due to Ratis server is shutdown. First ContainerStateMachine is closed, "Container statemachine is closed by ratis, terminating HddsDatanodeService", then XceiverServerRatis is stopped, "Stopping XceiverServerRatis 01effdc6-dad1-4bf3-916a-749d9aa7e5e5".
2024-04-24 18:06:16,666 WARN util.JvmPauseMonitor (JvmPauseMonitor.java:detectPause(168)) - JvmPauseMonitor-01effdc6-dad1-4bf3-916a-749d9aa7e5e5: Detected pause in JVM or host machine approximately 93.265s without any GCs.
2024-04-24 18:06:16,666 ERROR server.RaftServer (RaftServerProxy.java:handleJvmPause(237)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: JVM pause detected 93.265s longer than the close-threshold 60s, shutting down ...
2024-04-24 18:06:16,678 INFO server.RaftServer (RaftServerProxy.java:lambda$close$9(416)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: close
2024-04-24 18:06:16,684 INFO server.RaftServer$Division (RaftServerImpl.java:lambda$close$3(526)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E: shutdown
2024-04-24 18:06:16,685 INFO server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcClientProtocolService now
2024-04-24 18:06:16,690 INFO util.JmxRegister (JmxRegister.java:unregister(73)) - Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-5EA60976374E,id=01effdc6-dad1-4bf3-916a-749d9aa7e5e5
2024-04-24 18:06:16,691 INFO impl.RoleInfo (RoleInfo.java:shutdownLeaderState(94)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-LeaderStateImpl
2024-04-24 18:06:16,724 INFO impl.PendingRequests (PendingRequests.java:sendNotLeaderResponses(289)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-PendingRequests: sendNotLeaderResponses
2024-04-24 18:06:16,727 INFO server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcClientProtocolService successfully
2024-04-24 18:06:16,727 INFO server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server GrpcServerProtocolService now
2024-04-24 18:06:16,728 INFO impl.StateMachineUpdater (StateMachineUpdater.java:stopAndJoin(157)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: set stopIndex = 4
2024-04-24 18:06:16,729 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(359)) - group-5EA60976374E: Taking a snapshot at:(t:3, i:4) file /tmp/datanode1/ratis/e9e7ba3c-7686-4b3a-96fd-5ea60976374e/sm/snapshot.3_4
2024-04-24 18:06:16,729 INFO server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server GrpcServerProtocolService successfully
2024-04-24 18:06:16,729 INFO server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcAdminProtocolService now
2024-04-24 18:06:16,732 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(370)) - group-5EA60976374E: Finished taking a snapshot at:(t:3, i:4) file:/tmp/datanode1/ratis/e9e7ba3c-7686-4b3a-96fd-5ea60976374e/sm/snapshot.3_4 took: 4 ms
2024-04-24 18:06:16,733 INFO server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcAdminProtocolService successfully
2024-04-24 18:06:16,734 INFO impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(295)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: Took a snapshot at index 4
2024-04-24 18:06:16,734 INFO impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(98)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: snapshotIndex: updateIncreasingly 2 -> 4
2024-04-24 18:06:16,740 ERROR ratis.ContainerStateMachine (ContainerStateMachine.java:close(1142)) - Container statemachine is closed by ratis, terminating HddsDatanodeService
2024-04-24 18:06:26,754 INFO ozoneimpl.OzoneContainer (OzoneContainer.java:stop(482)) - Attempting to stop container services.
2024-04-24 18:06:26,754 WARN ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:handleRemainingSleep(134)) - Background container scan was interrupted.
2024-04-24 18:06:26,754 INFO ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:run(61)) - Thread[ContainerMetadataScanner,5,main] exiting.
2024-04-24 18:06:26,755 INFO ozoneimpl.BackgroundContainerDataScanner (BackgroundContainerDataScanner.java:shutdown(141)) - ContainerDataScanner(/tmp/datanode1/storage/hdds) is shutting down.
2024-04-24 18:06:26,755 WARN ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:handleRemainingSleep(134)) - Background container scan was interrupted.
2024-04-24 18:06:26,755 INFO ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:run(61)) - ContainerDataScanner(/tmp/datanode1/storage/hdds, DS-af727dc0-66f9-4db9-8f1f-8ce487a40766) exiting.
2024-04-24 18:06:26,755 INFO ozoneimpl.OnDemandContainerDataScanner (OnDemandContainerDataScanner.java:shutdownScanner(206)) - On-demand container scanner is shutting down.
2024-04-24 18:06:26,756 INFO ratis.XceiverServerRatis (XceiverServerRatis.java:stop(604)) - Stopping XceiverServerRatis 01effdc6-dad1-4bf3-916a-749d9aa7e5e5
2024-04-24 18:06:26,757 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(152)) - JvmPauseMonitor-01effdc6-dad1-4bf3-916a-749d9aa7e5e5: Stopped
2024-04-24 18:06:28,892 INFO volume.HddsVolume (HddsVolume.java:closeDbStore(470)) - SchemaV3 db is stopped at /tmp/datanode1/storage/hdds/CID-9ba4109c-68b1-4311-9623-42f82149fb80/DS-af727dc0-66f9-4db9-8f1f-8ce487a40766/container.db for volume DS-af727dc0-66f9-4db9-8f1f-8ce487a40766
2024-04-24 18:06:28,893 INFO utils.BackgroundService (BackgroundService.java:shutdown(160)) - Shutting down service BlockDeletingService
2024-04-24 18:06:28,893 INFO utils.BackgroundService (BackgroundService.java:shutdown(160)) - Shutting down service StaleRecoveringContainerScrubbingService
2024-04-24 18:06:28,894 INFO statemachine.DatanodeStateMachine (DatanodeStateMachine.java:stopDaemon(640)) - Ozone container server stopped.
2024-04-24 18:06:28,899 INFO handler.ContextHandler (ContextHandler.java:doStop(1159)) - Stopped o.e.j.w.WebAppContext@5fbdc49b{hddsDatanode,/,null,STOPPED}{file:/Users/sammi/workspace/hadoop-ozone/hadoop-hdds/container-service/target/classes/webapps/hddsDatanode}
2024-04-24 18:06:28,903 INFO server.AbstractConnector (AbstractConnector.java:doStop(383)) - Stopped ServerConnector@7fc7c4a{HTTP/1.1, (http/1.1)}{SAMMICHEN-MB0:9882}
2024-04-24 18:06:28,903 INFO server.session (HouseKeeper.java:stopScavenging(149)) - node0 Stopped scavenging
2024-04-24 18:06:28,903 INFO handler.ContextHandler (ContextHandler.java:doStop(1159)) - Stopped o.e.j.s.ServletContextHandler@76c387f9{static,/static,file:///Users/sammi/workspace/hadoop-ozone/hadoop-hdds/container-service/target/classes/webapps/static,STOPPED}
2024-04-24 18:06:28,904 INFO ozone.HddsDatanodeClientProtocolServer (HddsDatanodeClientProtocolServer.java:stop(83)) - Stopping the RPC server for Client Protocol
2024-04-24 18:06:28,905 INFO ipc.Server (Server.java:stop(3523)) - Stopping server on 19864
2024-04-24 18:06:28,905 INFO ipc.Server (Server.java:run(1434)) - Stopping IPC Server listener on 19864
2024-04-24 18:06:28,905 INFO ipc.Server (Server.java:run(1567)) - Stopping IPC Server Responder
2024-04-24 18:06:28,908 INFO util.ExitUtil (ExitUtil.java:terminate(241)) - Exiting with status 1: ExitException
2024-04-24 18:06:28,909 INFO ozone.HddsDatanodeService (StringUtils.java:lambda$startupShutdownMessage$0(144)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down HddsDatanodeService at SAMMICHEN-MB0/0.0.0.0
************************************************************/
Process finished with exit code 1
@adoroszlai , I noticed the impact to the integration test too. It looks like terminate the DN in ContainerStateMachine is not a good idea for DN. Let me think if there is other solutions.
Wait for RATIS release including https://issues.apache.org/jira/browse/RATIS-2066.
I was made aware that for OM if Ratis server experiences a long pause, Ratis state machine crashes itself and that shuts down OM: https://issues.apache.org/jira/browse/HDDS-6141
I was made aware that for OM if Ratis server experiences a long pause, Ratis state machine crashes itself and that shuts down OM: https://issues.apache.org/jira/browse/HDDS-6141
Both OM and SCM will shutdown itself after a long pause.
Manual close the datanode, related datanode log
2024-07-01 17:34:54,237 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:notifyServerShutdown(936)) - group-2D6AB2E224A3 is closed by HddsDatanodeService
2024-07-01 17:34:54,774 INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:close(248)) - 9c367fb6-68b0-487d-bb10-3e8c0da9b148@group-6CC213E8C815-SegmentedRaftLogWorker close()
2024-07-01 17:34:54,775 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:notifyServerShutdown(936)) - group-6CC213E8C815 is closed by HddsDatanodeService
2024-07-01 17:34:54,805 INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:close(248)) - 9c367fb6-68b0-487d-bb10-3e8c0da9b148@group-86A881EBB3A5-SegmentedRaftLogWorker close()
2024-07-01 17:34:54,812 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:notifyServerShutdown(936)) - group-86A881EBB3A5 is closed by HddsDatanodeService
Manual pause DN process and then resume the process
2024-07-01 17:56:07,572 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:notifyServerShutdown(896)) - group-60029B7F6B87 is closed by ratis
2024-07-01 17:56:07,585 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:notifyServerShutdown(896)) - group-86A881EBB3A5 is closed by ratis
2024-07-01 17:56:07,586 INFO ratis.ContainerStateMachine (ContainerStateMachine.java:notifyServerShutdown(896)) - group-6CC213E8C815 is closed by ratis
2024-07-01 17:56:12,580 ERROR ratis.ContainerStateMachine (ContainerStateMachine.java:lambda$notifyServerShutdown$9(916)) - Container statemachine is closed by ratis, terminating HddsDatanodeService. closed(3)/total(3)
All three failed misc acceptance runs are due to
failed to solve: process "/bin/sh -c sudo yum install -y openssh-clients openssh-server" did not complete successfully: exit code: 1
It cannot tell from the current logs why it failed. @adoroszlai , do you have any idea about this issue?
Looks like the problem is
> [om 2/15] RUN sudo yum install -y openssh-clients openssh-server:
#0 0.519 Loaded plugins: fastestmirror, ovl
#0 0.783 Determining fastest mirrors
#0 1.328 Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=aarch64&repo=os&infra=container error was
#0 1.328 14: curl#6 - "Could not resolve host: mirrorlist.centos.org; Unknown error"
#0 1.338
#0 1.338
#0 1.338 One of the configured repositories failed (Unknown),
#0 1.338 and yum doesn't have enough cached data to continue. At this point the only
#0 1.338 safe thing yum can do is fail. There are a few ways to work "fix" this:
#0 1.338
#0 1.338 1. Contact the upstream for the repository and get them to fix the problem.
#0 1.338
#0 1.338 2. Reconfigure the baseurl/etc. for the repository, to point to a working
#0 1.338 upstream. This is most often useful if you are using a newer
#0 1.338 distribution release than is supported by the repository (and the
#0 1.338 packages for the previous distribution release still work).
#0 1.338
#0 1.338 3. Run the command with the repository temporarily disabled
#0 1.338 yum --disablerepo=<repoid> ...
#0 1.338
#0 1.338 4. Disable the repository permanently, so yum won't use it by default. Yum
#0 1.338 will then just ignore the repository until you permanently enable it
#0 1.338 again or use --enablerepo for temporary usage:
#0 1.338
#0 1.338 yum-config-manager --disable <repoid>
#0 1.338 or
#0 1.338 subscription-manager repos --disable=<repoid>
#0 1.338
#0 1.338 5. Configure the failing repository to be skipped, if it is unavailable.
#0 1.338 Note that yum will try to contact the repo. when it runs most commands,
#0 1.338 so will have to try and fail each time (and thus. yum will be be much
#0 1.338 slower). If it is a very temporary problem though, this is often a nice
#0 1.338 compromise:
#0 1.338
#0 1.338 yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
#0 1.338
#0 1.338 Cannot find a valid baseurl for repo: base/7/aarch64
------
failed to solve: process "/bin/sh -c sudo yum install -y openssh-clients openssh-server" did not complete successfully: exit code: 1
failed to solve: process "/bin/sh -c sudo yum install -y openssh-clients openssh-server" did not complete successfully: exit code: 1
@ChenSammi please see #6893. This should be OK after merging from master
.
Looks like all comments are addressed. HDDS-11092 is merged into HDDS-7593 so the previous error is no longer seen.
Thanks @smengcl @jojochuang @adoroszlai for the review.