infrastructure icon indicating copy to clipboard operation
infrastructure copied to clipboard

test-ibmcloud-ubuntu1604-x64-1 cannot clean workspace

Open sophia-guo opened this issue 1 year ago • 6 comments

test-ibmcloud-ubuntu1604-x64-1

ERROR: Cannot delete workspace :Unable to delete '/home/jenkins/workspace/Grinder/jvmtest/system/reproducibleCompare/temurin-build/.azure-devops/README.md'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.

https://ci.adoptium.net/view/Test_grinder/job/Grinder/9302/

@sxa could you help this ? And could you check the permission of this file? It's weird it's a normal file in the grinder job's workspace.

sophia-guo avatar Mar 26 '24 14:03 sophia-guo

The whole of the temurin-build directory is owned by root:root including that file which I guess is just the first one in order when you look at the checked out directory.

root@test-ibmcloud-ubuntu1604-x64-1:/home/jenkins/workspace/Grinder/jvmtest/system/reproducibleCompare# ls -ld temurin-build/.azure-devops/README.md 
-rw-r--r-- 1 root root 3417 Mar 25 20:36 temurin-build/.azure-devops/README.md

The directory has now been deleted.

sxa avatar Mar 26 '24 17:03 sxa

Confused. The same build run on same agent https://ci.adoptium.net/view/Test_grinder/job/Grinder/9301/console, which doesn't have this issue and can delete the workspace correctly.

sophia-guo avatar Mar 26 '24 19:03 sophia-guo

I'm hitting the same issue. Can I get root permission to this agent to see what happened? Same issues happen to test-ibmcloud-rhel7-x64-1, test-equinix_esxi-ubuntu2204-x64-1. Feels like common issues hiddened.

sophia-guo avatar Mar 27 '24 17:03 sophia-guo

https://github.com/adoptium/infrastructure/issues/3490#issuecomment-2023337971

@sxa

sophia-guo avatar Mar 27 '24 17:03 sophia-guo

Same issues happen to test-ibmcloud-rhel7-x64-1, test-equinix_esxi-ubuntu2204-x64-1. Feels like common issues hiddened.

I haven't seen the design of what you're doing other than it's replicating the existing reproducible build jobs on via the AQA test process and machines, but normally this would happen if you run things as root inside the container while using a mapped volume from the host inside the container. The normal build processes always run as the jenkins user inside the container. It's also possible that it's related to which docker implementation is on each machine, although I wouldn't expect any problems with the ubuntu2204 system (Note that test-equinix_esxi-ubuntu2204-x64-1 will be decommissioned very soon as part of https://github.com/adoptium/infrastructure/issues/3292)

Can I get root permission to this agent to see what happened?

If you need that access, please use the normal process of raising an issue specifically for that. If you're able to do the investigation today then test-equinix_esxi-ubuntu2204-x64-1 might be the best choice.

sxa avatar Mar 28 '24 10:03 sxa

Same issues happen to test-ibmcloud-rhel7-x64-1, test-equinix_esxi-ubuntu2204-x64-1.

I've cleaned those two up too. (Path name for reference: temurin-build under /home/jenkins/workspace/Grinder/jvmtest/system/reproducibleCompare)

sxa avatar Mar 28 '24 15:03 sxa

Closing this for now as the immediate problem has been fixed, but it may come back :-)

sxa avatar Apr 24 '24 12:04 sxa