build icon indicating copy to clipboard operation
build copied to clipboard

Fail to fetch repo on arm-debug

Open aduh95 opened this issue 2 weeks ago • 8 comments

All https://ci.nodejs.org/job/node-test-commit-arm-debug/ builds are failing in the past 48 hours

 > git fetch --no-tags --force --progress -- [email protected]:nodejs/node.git +refs/heads/*:refs/remotes/origin/* +refs/pull/60711/head:refs/remotes/origin/_jenkins_local_branch # timeout=30
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from [email protected]:nodejs/node.git
	at PluginClassLoader for git//hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:998)
	at PluginClassLoader for git//hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1239)
	at PluginClassLoader for git//hudson.plugins.git.GitSCM._checkout(GitSCM.java:1310)
	at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1277)
	at hudson.scm.SCM.checkout(SCM.java:540)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1250)
	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:522)
	at hudson.model.Run.execute(Run.java:1860)
	at PluginClassLoader for matrix-project//hudson.matrix.MatrixRun.run(MatrixRun.java:153)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:460)
Caused by: hudson.plugins.git.GitException: Command "git fetch --no-tags --force --progress -- [email protected]:nodejs/node.git +refs/heads/*:refs/remotes/origin/* +refs/pull/60711/head:refs/remotes/origin/_jenkins_local_branch" returned status code 128:
stdout: 
stderr: ssh: Could not resolve hostname github.com: Temporary failure in name resolution
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

	at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2844)
	at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:2189)
	at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:638)
	at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:173)
	at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:164)
	at hudson.remoting.UserRequest.perform(UserRequest.java:225)
	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
	at hudson.remoting.Request$2.run(Request.java:391)
	at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:81)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:138)
	at java.base/java.lang.Thread.run(Thread.java:840)
	Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from 20.172.67.207/20.172.67.207:35820
		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1916)
		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:384)
		at hudson.remoting.Channel.call(Channel.java:1108)
		at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:155)
		at jdk.internal.reflect.GeneratedMethodAccessor321.invoke(Unknown Source)
		at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.base/java.lang.reflect.Method.invoke(Method.java:569)
		at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:140)
		at PluginClassLoader for git-client/jdk.proxy76/jdk.proxy76.$Proxy162.execute(Unknown Source)
		at PluginClassLoader for git//hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:996)
		at PluginClassLoader for git//hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1239)
		at PluginClassLoader for git//hudson.plugins.git.GitSCM._checkout(GitSCM.java:1310)
		at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1277)
		at hudson.scm.SCM.checkout(SCM.java:540)
		at hudson.model.AbstractProject.checkout(AbstractProject.java:1250)
		at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
		at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
		at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:522)
		at hudson.model.Run.execute(Run.java:1860)
		at PluginClassLoader for matrix-project//hudson.matrix.MatrixRun.run(MatrixRun.java:153)
		at hudson.model.ResourceController.execute(ResourceController.java:101)
		at hudson.model.Executor.run(Executor.java:460)
ERROR: Error fetching remote repo 'origin'

aduh95 avatar Dec 05 '25 15:12 aduh95

Looks like non-debug builds on the hosts are also affected, e.g. https://ci.nodejs.org/job/node-test-commit-arm/nodes=ubuntu2204-arm64/61271/console

It looks like the issue is on the test-azure-ubuntu2404_docker-arm64-3 host:

$ ssh test-azure-ubuntu2404_docker-arm64-1 nslookup github.com
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.116.4

$ ssh test-azure-ubuntu2404_docker-arm64-2 nslookup github.com
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.113.3

$ ssh test-azure-ubuntu2404_docker-arm64-3 nslookup github.com
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; no servers could be reached

$

richardlau avatar Dec 05 '25 15:12 richardlau

I logged into test-azure-ubuntu2404_docker-arm64-3 and it said it needed a system restart. So I restarted it (shutdown -r)... and it doesn't appear to have come back up yet. 🫤

richardlau avatar Dec 05 '25 16:12 richardlau

System reboot was requested over 15 mins ago, which I'd normally expect to be enough time for the machine to come back up (it is currently "connection refused" and the two containers on it offline in Jenkins).

nodejs@test-azure-ubuntu2404-docker-arm64-3:~$ sudo shutdown -r

Broadcast message from root@test-azure-ubuntu24 on pts/1 (Fri 2025-12-05 16:00:16 UTC):

The system will reboot at Fri 2025-12-05 16:01:16 UTC!

Reboot scheduled for Fri 2025-12-05 16:01:16 UTC, use 'shutdown -c' to cancel.
nodejs@test-azure-ubuntu2404-docker-arm64-3:~$
Broadcast message from root@test-azure-ubuntu24 on pts/1 (Fri 2025-12-05 16:01:16 UTC):

The system will reboot now!

Connection to 20.172.67.207 closed by remote host.
Connection to 20.172.67.207 closed.

@ryanaslett (cc @bensternthal ) will need help here as this is one of the machines under https://github.com/nodejs/build/issues/4133.

richardlau avatar Dec 05 '25 16:12 richardlau

Ah machine now appears to be up. Maybe it just took significantly longer than expected to restart.

richardlau avatar Dec 05 '25 17:12 richardlau

hmm but the Jenkins agents are still offline.

richardlau avatar Dec 05 '25 18:12 richardlau

Machine is still unable to resolve hosts (e.g. github.com). Jenkins agents fail to start because the machine cannot resolve ci.nodejs.org.

richardlau avatar Dec 05 '25 18:12 richardlau

I was unable to determine what was causing this machine to not be able to communicate with the default azure nameserver. It could connect to port 53, but it was almost behaving as if that internal nameserver had ratelimited it for some reason.

I went ahead and edited the /etc/systemd/resolved.conf and pointed it at 1.1.1.1 and 8.8.8.8 as a backup and it works now.

I guess chalk this up to "just Azure things?" Im not sure.

ryanaslett avatar Dec 06 '25 00:12 ryanaslett

@ryanaslett Interesting. I've not experienced that with Azure x64 machines - had it been set to 127.0.0.53 which seems the default for Azure Ubuntu/x64 installations?

sxa avatar Dec 08 '25 11:12 sxa