product-microgateway
product-microgateway copied to clipboard
Build failure is observed for arm64
Description:
We are trying to build this package for arm64 , We have tried build package on ubuntu + arm64 machine and observed following failure
[ERROR] Failures:
[ERROR] APIKeyTestCase.start:60->BaseTestCase.init:135->BaseTestCase.initAndStartMicroGWServer:72 ? Runtime
[ERROR] AdvanceEndpointConfigTestCase>EndpointsByReferenceTestCase.start:64->BaseTestCase.init:148->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] BasicGrpcTestCase.start:61->BaseTestCase.init:160->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] HTTP2RequestsWithHTTP1BackEndTestCase>HTTP2RequestsWithHTTP2BackEndTestCase.setup:93->BaseTestCase.init:105->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] JavaInterceptorTestCase>InterceptorTestCase.start:55->BaseTestCase.init:160->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] JWTGenerationTestCase.start:70->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] JWTRevocationSupportTestCase.start:122->BaseTestCase.init:105->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] JwtTransformerTestCase.start:64->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] APIInvokeWithOAuth2andBasicAuthTestCase>APIInvokeWithOAuthTestCase.start:84->BaseTestCase.init:116->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] DisableSecurityAndCustomAuthHeaderTestCase>ScopesTestCase.start:81->BaseTestCase.init:160->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] MutualSSLTestCase>AuthenticationFailureTestCase.setup:74->BaseTestCase.init:105->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] DistributedThrottlingTestCase.start:145->init:48->BaseTestCase.init:105->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] OpenApiThrottlingTestCase>OASAPIInvokeTestCase.start:56->BaseTestCase.init:160->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] ThrottlingTestCase.start:135->BaseTestCase.init:116->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR] ValidationTestCase.start:48->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[INFO]
[ERROR] Tests run: 134, Failures: 15, Errors: 0, Skipped: 119
please suggest how to solve this build error .
Steps to reproduce:
ARM64 server mvn clean install
Environment details (with versions):
- OS: ubuntu
Hi @odidev We haven't tried the mgw build on arm64. However if you can attach the complete maven log related to test runs we should be able to help you
please check the attached log PGW_arm64_build_errorlog.txt
Hi @odidev, Below is the root cause of the issue. We've noticed this issue previously while looking for the best light weight base image for our docker images. What we noticed was there are some openjdk builds missing some of the required classes for http2 netty implementation. So we had to create a glibc based jdk base image to fix that issue. So If I remember things correctly, what you have to do here is chose a different jdk with support for netty http2 libs. @VirajSalaka this is the same error we faced in #1011 ryt?
2020-07-15 05:09:23 INFO ServerInstance:192 - Waiting for port 9590 to open
2020-07-15 05:09:23 INFO ServerLogReader:114 - JAVA_HOME: /usr/lib/jvm/java-1.8.0-openjdk-arm64/
2020-07-15 05:09:28 INFO ServerLogReader:114 - error: java.lang.UnsatisfiedLinkError message=failed to load the required native library cause=error java.lang.IllegalArgumentException message=Failed to load any of the given libraries: [netty_tcnative_linux_aarch_64, netty_tcnative_linux_aarch_64_fedora, netty_tcnative_aarch_64, netty_tcnative]
@praminda Thanks for the quick response, can you please suggest us which JDK we should use for arm64
Hmm, to be honest I haven't done a proper RnD on the arm64 JDKs. Right now I don't have any suggestions for you 😞
@praminda I have tested the build with oracle JDK 8 also but the same issue is seen, Also I have tried changing netty_version in pom.xml file but still, the same issue is seen, I am looking into the issue it would really help if you can also provide some link to resolve the issue.
@praminda , I have further checked netty_tcnative loading issue and following are my analysis
- We are seeing the error
error: java.lang.UnsatisfiedLinkError message=failed to load the required native library cause=error java.lang.IllegalArgumentException message=Failed to load any of the given libraries: [netty_tcnative_linux_aarch_64, netty_tcnative_linux_aarch_64_fedora, netty_tcnative_aarch_64, netty_tcnative]
Because we are using netty-tcnative-boringssl-static 2.0.7.Final which does not have arm64 support 2) We need to use version 2.0.31.Final which have arm64 support but with this also we have some issue
- mgw is not able to pick up the arm64 library from jar file
- If I manually installing netty-tcnative package then library loading error is solve but we are observing core dump through openjdk , I have attached dump information in attached file. Do let me know if any other information is required
We need to look into two thing to resolve build error with mgw
- We need to find out why mgw is not able to pickup the arm64 from jar file
- To check core dump issue , I need information from your side , What all functionality/interface of netty-tcnative we are using , which module in mgw code is using netty-tcnative module .If I have all this information then I can simulate the core dump issue and raise query with netty-tcnative community
please do let me know if any other information is required.
It seems that below issue is coming from one of its dependent package "ballerina-lang" which is using older netty(4.1.39.Final) and netty-tcnative-boringssl-static(2.0.25.Final) libraries which don't have support for AARCH64.
"INFO ServerLogReader:114 - error: java.lang.UnsatisfiedLinkError message=failed to load the required native library cause=error java.lang.IllegalArgumentException message=Failed to load any of the given libraries: [netty_tcnative_linux_aarch_64, netty_tcnative_linux_aarch_64_fedora, netty_tcnative_aarch_64, netty_tcnative]"
Netty 4.1.50.Final and netty-tcnative-boringssl-static(2.0.31.Final) firstly supports AARCH64. Refer release notes for details: https://netty.io/news/2020/05/13/4-1-50-Final.html
@odidev Thanks for the findings. We'll have to get a netty upgraded release from "ballerina-lang" for this. Can you create an issue at ballerina-lang?
@praminda Thanks for reply. Raised issue at ballerina-lang to upgrade netty-tcnative.
@praminda Ballerina is now available with ARM64 support from version 1.2.10. Also, it has already been updated to version 1.2.12 in product-microgateway. So, the native library for netty_tcnative_linux_aarch64 is successfully getting loaded now in the LINUX/ARM64 job.
However, after updating the netty version to 4.1.50.Final or higher, I am still facing the PORT issues in both AMD64 and ARM64 jobs, as mentioned in above comments. To support netty, I have changed the datatype of the second argument of both functions “onPingRead()” and “onPingAckRead()” in Http2Handler.java file, from ‘Bytebuf’ to ‘long’. After updating the netty version, I am facing PORT issues on both AMD64 and ARM64, as below:
[ERROR] start(org.wso2.micro.gateway.tests.jwtRevocation.JWTRevocationSupportTestCase) Time elapsed: 59.613 s <<< FAILURE!
org.wso2.micro.gateway.tests.context.MicroGWTestException: Unable to start carbon server on port 9590 : Port already in use
at org.wso2.micro.gateway.tests.jwtRevocation.JWTRevocationSupportTestCase.start(JWTRevocationSupportTestCase.java:122)
[ERROR] setup(org.wso2.micro.gateway.tests.security.MutualSSLTestCase) Time elapsed: 59.622 s <<< FAILURE!
java.lang.AssertionError: Port: 9443 already in use. expected [false] but found [true]
[ERROR] setup(org.wso2.micro.gateway.tests.http2.HTTP2RequestsWithHTTP1BackEndTestCase) Time elapsed: 59.631 s <<< FAILURE!
java.lang.AssertionError: Port: 9443 already in use. expected [false] but found [true]
....
To confirm that the above ports are not in use, I have created a fresh docker container with ubuntu: bionic, and JDK set to openjdk-8-jdk. I have used below ‘ss’ command to check the availability of ports:
ss -tulw
This prints the active tcp/udp ports. Before building the package, below is the output of ss command:
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
After updating netty, ‘mvn clean install’ command failed on port issues as above, and again ss command shows below output:
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
tcp LISTEN 0 100 0.0.0.0:9590 0.0.0.0:*
tcp LISTEN 0 100 0.0.0.0:9595 0.0.0.0:*
tcp LISTEN 0 100 0.0.0.0:9596 0.0.0.0:*
It shows that port 9590 is active, which is failing to get acquired by the carbon server. I also fused this port using the “fuser -k 9590/tcp” command, and again triggered the build, but encountered the same port issues again.
I thought that there may be some issues with the multithreading, so I applied 30 milliseconds delay in the checkPortAvailability function, but that has not helped as well. Rather that distorted the synchronization in port usage.
I have checked that there are 2 PRs raised previously, for updating the netty version. Here are the PRs: https://github.com/wso2/product-microgateway/pull/501 https://github.com/wso2/product-microgateway/pull/352 These were closed and comments do not communicate any concrete reasoning.
So, finally I have 2 questions here.
- May I know, Why were the PRs to update the netty version, closed? Is there some more work to be done for updating netty in MGW? If yes, can you please provide me with some information, so I can also contribute and push my work ahead?
- Are you interested in accepting the PR to include Linux/ARM64 jobs to the Travis-CI?
It would be really helpful if I can get pointers to solve the above issues encountered after updating the netty version.
Hi @odidev Thanks for the findings. IIRC we closed these PRs due to some of the http2 test failures we had during release period. You must be facing the same test failures. So if we are to upgrade netty. We'll have to fix the test failures too. If you prefer to contribute to the project with the netty upgrade can you please look at the test failure and what causes the failure. That's a big help. if you hit any blocker we should be able to help you.
Our contribution guidelines need to be updated but there is nothing much different from whats mentioned here. Only thing is you don't need Go
to work with the latest master.
Also please include the complete build log for tests
module to see where the error is actually happening.
@praminda Please find my logs below:
-
Logs with successfully passing 'master' branch with no netty update: Logs_With_Original_Netty_Version_in_MGW.txt
-
Logs with ‘Netty’ updated to “4.1.50.Final”, ‘netty-tcnative-boringssl-static’ updated to “2.0.31.Final”, and second argument of functions “onPingRead()” and “onPingAckRead()” in Http2Handler.java file, updated from ‘Bytebuf’ to ‘long’: Logs_With_Netty_Updated_to_4.1.50.Final.txt
@praminda In MGW, I have encountered the error in logs after updating netty, which says:
2021-01-08 10:47:27 WARN AmqpConnectionHandler:133 - Bad message received
java.lang.IllegalArgumentException: Unknown protocol name AMQP
at io.ballerina.messaging.broker.amqp.codec.handlers.AmqpDecoder.processProtocolInitFrame(AmqpDecoder.java:105)
at io.ballerina.messaging.broker.amqp.codec.handlers.AmqpDecoder.decode(AmqpDecoder.java:77)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
...
This error message is coming from the dependency “io.ballerina.messaging.broker-launcher”. Here is the reference to the error message we have received in MGW, via ballerina-message-broker: https://github.com/wso2-attic/ballerina-message-broker/blob/master/modules/broker-amqp/src/main/java/io/ballerina/messaging/broker/amqp/codec/handlers/AmqpDecoder.java#L104.
I have checked that this dependency uses an older version of netty, and I think if we update netty in MGW, we also need to make required changes in ballerina-message-broker, to make it work with the latest netty. But the blocker is, “ballerina-message-broker” maintainers have rolled out the last release(v0.970.5) in Aug, 2018, and the repo is unmaintained after that. We can not submit issues and PRs, as mentioned in their readme.md here https://github.com/wso2-attic/ballerina-message-broker#this-repository-is-no-longer-maintained.
We clearly can not update netty here, as the latest yet old version of “ballerina-message-broker” has conflicts with the latest netty version used in MGW.
I think we may need to replace this dependency to make netty update work for MGW. Do you have any suggestions on this issue?
@odidev Great job in finding the cause of the issue. Thanks. We may have to get an upgrade from ballerina-mb or move to another amqp client. @Asitha Do you have any feedback on what we can do here?
Any workarounds to get the microgateway running in ARM64 ?? We are also the same above issue ! @praminda - Any leads on this would be helpful.