infrastructure icon indicating copy to clipboard operation
infrastructure copied to clipboard

Test Jenkins job failed with `ERROR: Cannot delete workspace :Malformed input or input contains unmappable characters` on some machines

Open sophia-guo opened this issue 3 years ago • 21 comments

We have seen this failure a few times. Jenkins job failed with ERROR: Cannot delete workspace :Malformed input or input contains unmappable characters at post stage.

https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_linux_testList_1/45/console

[Pipeline] cleanWs
[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Deferred wipeout is disabled by the job configuration...
ERROR: Cannot delete workspace :Malformed input or input contains unmappable characters: /home/jenkins/workspace/Test_openjdk8_hs_extended.openjdk_x86-64_linux_testList_1/aqa-tests/TKG/output_16555602234151/jdk_tools_1/work/scratch/1/�@�@8
[Pipeline] }
[Pipeline] // timeout
[Pipeline] echo
Exception: hudson.AbortException: Cannot delete workspace: Malformed input or input contains unmappable characters: /home/jenkins/workspace/Test_openjdk8_hs_extended.openjdk_x86-64_linux_testList_1/aqa-tests/TKG/output_16555602234151/jdk_tools_1/work/scratch/1/�@�@8

The failed job above is on test-docker-ubuntu2204-x64-1.

Recorded another time is on aarch64 https://github.com/adoptium/aqa-tests/issues/3594#issuecomment-1103826881.

According to https://support.cloudbees.com/hc/en-us/articles/360004397911-How-to-address-issues-with-unmappable-characters- the build agent’s JVM is most likely the culprit.

Set the agent JVM options as suggested and restart the agent might help.

sophia-guo avatar Jun 22 '22 15:06 sophia-guo

same failure on test-docker-ubuntu2004-armv7l-2

sophia-guo avatar Jun 28 '22 19:06 sophia-guo

same failure on test-docker-ubuntu2010-armv8l-2 https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_aarch64_linux/186/console

sophia-guo avatar Jun 29 '22 18:06 sophia-guo

https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_aarch64_linux/189/console

test-docker-ubuntu2004-armv8l-2

https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_arm_linux/154/ test-docker-ubuntu2004-armv7l-2

sophia-guo avatar Jul 11 '22 16:07 sophia-guo

https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_aarch64_linux/191/console

test-docker-ubuntu2110-armv8l-1

sophia-guo avatar Jul 11 '22 16:07 sophia-guo

Feels most recent sanity.openjdk builds on aarch64 and arm32 are affected. This also happened to compliance builds. Is there a way to escalate this issue's priority?

sophia-guo avatar Jul 11 '22 16:07 sophia-guo

test-docker-ubuntu1804-armv8l-2 https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_aarch64_linux/192/

test-docker-ubuntu2004-armv7l-1 in /home/jenkins/workspace/Test_openjdk17_hs_sanity.openjdk_arm_linux [Pipeline] {

sophia-guo avatar Jul 13 '22 19:07 sophia-guo

If the problem doesn't resolve it will happen at the start cleanup stage , which means job failed before tests are running. Hence tag this with critical as suggested @sxa

test-docker-ubuntu2004-armv7l-2

(https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_arm_linux/158/console)

sophia-guo avatar Jul 13 '22 19:07 sophia-guo

That's interesting ... The default encoding on the first one I checked (the Ubuntu 22.04 machine) is ANSI_X3.4-1968 It can be reset either by setting LANG in the environment to a suitable UTF-8 value, or with the -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 properties mentioned in the article you linked to.

sxa avatar Jul 14 '22 12:07 sxa

@sophia-guo Can you see if this is the result of a test that has been unexcluded in the last few months? I want to fix it regardless but it would just be good to know why it's occurred, since it's going to be a bit of work to get this changed on every possible machine.

sxa avatar Jul 14 '22 13:07 sxa

Verified that the problem does not occur if I add in the encoding defines to the agent JVM, or if LANG=C.UTF-8 in the environment. Assuming the latter does not cause any side effects for other tests, that would likely be my preferred solution (It's easy to put in the dockerfiles) but I think in the short term I can look at modifying the jenkins configurations.

It looks like this is only affecting docker containers so the ultimate solution should be something that can be put into the static dockerfiles

sxa avatar Jul 14 '22 13:07 sxa

All except [test-docker-alpine314-x64-2](https://ci.adoptopenjdk.net/computer/test%2Ddocker%2Dalpine314%2Dx 64%2D2), test-docker-alpine314-x64-1 test-docker-alpine311-x64-1 should now be ok (Those three currently have jobs running on them so I cannot restart the agent yet)

sxa avatar Jul 14 '22 14:07 sxa

Last three done.

sxa avatar Jul 14 '22 15:07 sxa

rerun with test-docker-ubuntu2004-armv7l-2 https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/5198/ no issue.

sophia-guo avatar Jul 14 '22 17:07 sophia-guo

test-docker-fedora35-armv8l-1 is the one without issue with latest run. Might be helpful to see the configuration or docker files difference. https://ci.adoptopenjdk.net/job/Test_openjdk18_hs_sanity.openjdk_aarch64_linux/129/

sophia-guo avatar Jul 14 '22 17:07 sophia-guo

More information about one of the problematic tests java/lang/invoke/lambda/LambdaFileEncodingSerialization.java, which was added specifically to verifying deserializeLambda containing a non-ASCII mappable char is correctly handled as UTF-8 https://bugs.openjdk.org/browse/JDK-8248231 by @andrew-m-leonard and then problemlisted on linux-x64 due to https://bugs.openjdk.org/browse/JDK-8249079. ( Note, not the same issue as this one).

In Adoptium the test was disabled with 18+ on arm as same issue as https://bugs.openjdk.org/browse/JDK-8249079, that is the reason we saw this issue on arm jdk17 and aarch64 17+ only. Based on those information I will disable the tests with 17+ on all platforms.

sophia-guo avatar Jul 14 '22 17:07 sophia-guo

Since it hasn't been lsited anywhere else in here the grinder recreate options for this are:

  • BUILD_LIST=openjdk
  • TARGET=jdk_custom
  • CUSTOM_TARGET=java/lang/invoke/lambda/LambdaFileEncodingSerialization.java

(Or use TARGET=jdk_lang_0 sine that's the group this test is in)

sxa avatar Jul 15 '22 09:07 sxa

test-docker-ubuntu2004-armv7l-1 still have same issues. Last Friday July15 's run https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_arm_linux/160/console

sophia-guo avatar Jul 17 '22 18:07 sophia-guo

test-docker-ubuntu2004-armv7l-1 still have same issues. Last Friday July15 's run https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_arm_linux/160/console

Not sure what happened there - the configuration had been updated for that machine and the agent was showing as having been restarted on the 14th of July. I've just restarted it again and it's definitely got the new parameters so should be ok now. I'll go through them all and verify that they have been correctly started with the new options and restart any that haven't.

sxa avatar Jul 18 '22 13:07 sxa

https://ci.adoptopenjdk.net/computer/test%2Ddocker%2Dubuntu2004%2Dx64%2D1/log also needs a restart but is currently running a test (https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_extended.system_x86-64_linux/192/) so I'll restart that one later. LIkewise https://ci.adoptopenjdk.net/computer/test%2Ddocker%2Dubuntu2010%2Dx64%2D2/log

[EDIT: Both now restarted]

sxa avatar Jul 18 '22 13:07 sxa

test-docker-ubuntu2004-armv7l-1 still has the issue https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.openjdk_arm_linux/162/console

sophia-guo avatar Jul 20 '22 15:07 sophia-guo

Feels like similar issue https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_extended.openjdk_aarch64_linux/85/

sophia-guo avatar Sep 02 '22 18:09 sophia-guo

java/lang/invoke/lambda/LambdaFileEncodingSerialization.java isn't excluded in jdk20 and casued this problem. Will exclude and test-docker-ubuntu2004-armv8l-3 need to reconfigure and restart as before @sxa ?

sophia-guo avatar Mar 13 '23 21:03 sophia-guo

I had presumed that going forward, this PR https://github.com/adoptium/aqa-tests/pull/4344 should avoid this problem ?

smlambert avatar Mar 14 '23 00:03 smlambert

I had presumed that going forward, this PR https://github.com/adoptium/aqa-tests/pull/4344 should avoid this problem ?

According to https://ci.adoptium.net/job/Test_openjdk20_hs_sanity.openjdk_aarch64_linux/76/ & https://ci.adoptium.net/job/Test_openjdk20_hs_sanity.openjdk_aarch64_linux/77/, the issue still stays.

sophia-guo avatar Mar 14 '23 20:03 sophia-guo

It happened in https://github.com/adoptium/aqa-tests/blob/master/buildenv/jenkins/openjdk_tests#L314. I can try the same rm -rf here to see if it works https://ci.adoptium.net/job/Test_openjdk20_hs_sanity.openjdk_aarch64_linux/78/console

sophia-guo avatar Mar 14 '23 20:03 sophia-guo

https://ci.adoptium.net/job/Test_openjdk20_hs_sanity.openjdk_aarch64_linux/78/console using this branch https://github.com/adoptium/aqa-tests/compare/master...sophia-guo:openjdk-tests:rm?expand=1 to rerun on the machine test-docker-ubuntu2004-armv8l-3 and didn't hit the issue ERROR: can not delete workspace , did the cleanup has been done on this machine?

sophia-guo avatar Mar 15 '23 03:03 sophia-guo

@sophia-guo I wonder if the wsCleanup attempt needs this?:

notFailBuild: true,

andrew-m-leonard avatar Mar 16 '23 11:03 andrew-m-leonard

@andrew-m-leonard that definitely helps when cleanWs() at the end of the test build. But for this specific issue the error happened at the beginning of the test build, which we need the clean workspace to do the test build.

sophia-guo avatar Mar 16 '23 12:03 sophia-guo

Closing as a workaround has been implemented

sxa avatar Apr 11 '23 12:04 sxa