product-microgateway icon indicating copy to clipboard operation
product-microgateway copied to clipboard

Build failure is observed for arm64

Open odidev opened this issue 4 years ago • 16 comments

Description:

We are trying to build this package for arm64 , We have tried build package on ubuntu + arm64 machine and observed following failure

[ERROR] Failures:
[ERROR]   APIKeyTestCase.start:60->BaseTestCase.init:135->BaseTestCase.initAndStartMicroGWServer:72 ? Runtime
[ERROR]   AdvanceEndpointConfigTestCase>EndpointsByReferenceTestCase.start:64->BaseTestCase.init:148->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   BasicGrpcTestCase.start:61->BaseTestCase.init:160->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   HTTP2RequestsWithHTTP1BackEndTestCase>HTTP2RequestsWithHTTP2BackEndTestCase.setup:93->BaseTestCase.init:105->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   JavaInterceptorTestCase>InterceptorTestCase.start:55->BaseTestCase.init:160->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   JWTGenerationTestCase.start:70->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   JWTRevocationSupportTestCase.start:122->BaseTestCase.init:105->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   JwtTransformerTestCase.start:64->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   APIInvokeWithOAuth2andBasicAuthTestCase>APIInvokeWithOAuthTestCase.start:84->BaseTestCase.init:116->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   DisableSecurityAndCustomAuthHeaderTestCase>ScopesTestCase.start:81->BaseTestCase.init:160->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   MutualSSLTestCase>AuthenticationFailureTestCase.setup:74->BaseTestCase.init:105->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   DistributedThrottlingTestCase.start:145->init:48->BaseTestCase.init:105->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   OpenApiThrottlingTestCase>OASAPIInvokeTestCase.start:56->BaseTestCase.init:160->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   ThrottlingTestCase.start:135->BaseTestCase.init:116->BaseTestCase.init:89->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[ERROR]   ValidationTestCase.start:48->BaseTestCase.init:131->BaseTestCase.initHttpServer:47 Port: 9443 already in use. expected [false] but found [true]
[INFO]
[ERROR] Tests run: 134, Failures: 15, Errors: 0, Skipped: 119

please suggest how to solve this build error .

Steps to reproduce:

ARM64 server mvn clean install

Environment details (with versions):

  • OS: ubuntu

odidev avatar Jul 14 '20 14:07 odidev

Hi @odidev We haven't tried the mgw build on arm64. However if you can attach the complete maven log related to test runs we should be able to help you

praminda avatar Jul 15 '20 03:07 praminda

please check the attached log PGW_arm64_build_errorlog.txt

odidev avatar Jul 15 '20 05:07 odidev

Hi @odidev, Below is the root cause of the issue. We've noticed this issue previously while looking for the best light weight base image for our docker images. What we noticed was there are some openjdk builds missing some of the required classes for http2 netty implementation. So we had to create a glibc based jdk base image to fix that issue. So If I remember things correctly, what you have to do here is chose a different jdk with support for netty http2 libs. @VirajSalaka this is the same error we faced in #1011 ryt?

2020-07-15 05:09:23 INFO  ServerInstance:192 - Waiting for port 9590 to open
2020-07-15 05:09:23 INFO  ServerLogReader:114 - JAVA_HOME: /usr/lib/jvm/java-1.8.0-openjdk-arm64/
2020-07-15 05:09:28 INFO  ServerLogReader:114 - error: java.lang.UnsatisfiedLinkError message=failed to load the required native library cause=error java.lang.IllegalArgumentException message=Failed to load any of the given libraries: [netty_tcnative_linux_aarch_64, netty_tcnative_linux_aarch_64_fedora, netty_tcnative_aarch_64, netty_tcnative]

praminda avatar Jul 15 '20 06:07 praminda

@praminda Thanks for the quick response, can you please suggest us which JDK we should use for arm64

odidev avatar Jul 15 '20 06:07 odidev

Hmm, to be honest I haven't done a proper RnD on the arm64 JDKs. Right now I don't have any suggestions for you 😞

praminda avatar Jul 15 '20 10:07 praminda

@praminda I have tested the build with oracle JDK 8 also but the same issue is seen, Also I have tried changing netty_version in pom.xml file but still, the same issue is seen, I am looking into the issue it would really help if you can also provide some link to resolve the issue.

odidev avatar Jul 15 '20 15:07 odidev

@praminda , I have further checked netty_tcnative loading issue and following are my analysis

  1. We are seeing the error

error: java.lang.UnsatisfiedLinkError message=failed to load the required native library cause=error java.lang.IllegalArgumentException message=Failed to load any of the given libraries: [netty_tcnative_linux_aarch_64, netty_tcnative_linux_aarch_64_fedora, netty_tcnative_aarch_64, netty_tcnative]

Because we are using netty-tcnative-boringssl-static 2.0.7.Final which does not have arm64 support 2) We need to use version 2.0.31.Final which have arm64 support but with this also we have some issue

  • mgw is not able to pick up the arm64 library from jar file
  • If I manually installing netty-tcnative package then library loading error is solve but we are observing core dump through openjdk , I have attached dump information in attached file. Do let me know if any other information is required

We need to look into two thing to resolve build error with mgw

  1. We need to find out why mgw is not able to pickup the arm64 from jar file
  2. To check core dump issue , I need information from your side , What all functionality/interface of netty-tcnative we are using , which module in mgw code is using netty-tcnative module .If I have all this information then I can simulate the core dump issue and raise query with netty-tcnative community

please do let me know if any other information is required.

hs_err_pid50767.log

odidev avatar Jul 17 '20 14:07 odidev

It seems that below issue is coming from one of its dependent package "ballerina-lang" which is using older netty(4.1.39.Final) and netty-tcnative-boringssl-static(2.0.25.Final) libraries which don't have support for AARCH64. "INFO ServerLogReader:114 - error: java.lang.UnsatisfiedLinkError message=failed to load the required native library cause=error java.lang.IllegalArgumentException message=Failed to load any of the given libraries: [netty_tcnative_linux_aarch_64, netty_tcnative_linux_aarch_64_fedora, netty_tcnative_aarch_64, netty_tcnative]"

Netty 4.1.50.Final and netty-tcnative-boringssl-static(2.0.31.Final) firstly supports AARCH64. Refer release notes for details: https://netty.io/news/2020/05/13/4-1-50-Final.html

odidev avatar Oct 09 '20 16:10 odidev

@odidev Thanks for the findings. We'll have to get a netty upgraded release from "ballerina-lang" for this. Can you create an issue at ballerina-lang?

praminda avatar Oct 15 '20 02:10 praminda

@praminda Thanks for reply. Raised issue at ballerina-lang to upgrade netty-tcnative.

odidev avatar Oct 15 '20 16:10 odidev

@praminda Ballerina is now available with ARM64 support from version 1.2.10. Also, it has already been updated to version 1.2.12 in product-microgateway. So, the native library for netty_tcnative_linux_aarch64 is successfully getting loaded now in the LINUX/ARM64 job.

However, after updating the netty version to 4.1.50.Final or higher, I am still facing the PORT issues in both AMD64 and ARM64 jobs, as mentioned in above comments. To support netty, I have changed the datatype of the second argument of both functions “onPingRead()” and “onPingAckRead()” in Http2Handler.java file, from ‘Bytebuf’ to ‘long’. After updating the netty version, I am facing PORT issues on both AMD64 and ARM64, as below:

[ERROR] start(org.wso2.micro.gateway.tests.jwtRevocation.JWTRevocationSupportTestCase)  Time elapsed: 59.613 s  <<< FAILURE!
org.wso2.micro.gateway.tests.context.MicroGWTestException: Unable to start carbon server on port 9590 : Port already in use
        at org.wso2.micro.gateway.tests.jwtRevocation.JWTRevocationSupportTestCase.start(JWTRevocationSupportTestCase.java:122)

[ERROR] setup(org.wso2.micro.gateway.tests.security.MutualSSLTestCase)  Time elapsed: 59.622 s  <<< FAILURE!
java.lang.AssertionError: Port: 9443 already in use. expected [false] but found [true]

[ERROR] setup(org.wso2.micro.gateway.tests.http2.HTTP2RequestsWithHTTP1BackEndTestCase)  Time elapsed: 59.631 s  <<< FAILURE!
java.lang.AssertionError: Port: 9443 already in use. expected [false] but found [true]
....

To confirm that the above ports are not in use, I have created a fresh docker container with ubuntu: bionic, and JDK set to openjdk-8-jdk. I have used below ‘ss’ command to check the availability of ports:

ss -tulw

This prints the active tcp/udp ports. Before building the package, below is the output of ss command:

Netid         State          Recv-Q          Send-Q          Local Address:Port          Peer Address:Port

After updating netty, ‘mvn clean install’ command failed on port issues as above, and again ss command shows below output:

Netid             State               Recv-Q              Send-Q                            Local Address:Port                           Peer Address:Port
tcp               LISTEN              0                   100                                     0.0.0.0:9590                                0.0.0.0:*
tcp               LISTEN              0                   100                                     0.0.0.0:9595                                0.0.0.0:*
tcp               LISTEN              0                   100                                     0.0.0.0:9596                                0.0.0.0:*

It shows that port 9590 is active, which is failing to get acquired by the carbon server. I also fused this port using the “fuser -k 9590/tcp” command, and again triggered the build, but encountered the same port issues again.

I thought that there may be some issues with the multithreading, so I applied 30 milliseconds delay in the checkPortAvailability function, but that has not helped as well. Rather that distorted the synchronization in port usage.

I have checked that there are 2 PRs raised previously, for updating the netty version. Here are the PRs: https://github.com/wso2/product-microgateway/pull/501 https://github.com/wso2/product-microgateway/pull/352 These were closed and comments do not communicate any concrete reasoning.

So, finally I have 2 questions here.

  1. May I know, Why were the PRs to update the netty version, closed? Is there some more work to be done for updating netty in MGW? If yes, can you please provide me with some information, so I can also contribute and push my work ahead?
  2. Are you interested in accepting the PR to include Linux/ARM64 jobs to the Travis-CI?

It would be really helpful if I can get pointers to solve the above issues encountered after updating the netty version.

odidev avatar Jan 07 '21 06:01 odidev

Hi @odidev Thanks for the findings. IIRC we closed these PRs due to some of the http2 test failures we had during release period. You must be facing the same test failures. So if we are to upgrade netty. We'll have to fix the test failures too. If you prefer to contribute to the project with the netty upgrade can you please look at the test failure and what causes the failure. That's a big help. if you hit any blocker we should be able to help you.

Our contribution guidelines need to be updated but there is nothing much different from whats mentioned here. Only thing is you don't need Go to work with the latest master.

Also please include the complete build log for tests module to see where the error is actually happening.

praminda avatar Jan 08 '21 09:01 praminda

@praminda Please find my logs below:

  1. Logs with successfully passing 'master' branch with no netty update: Logs_With_Original_Netty_Version_in_MGW.txt

  2. Logs with ‘Netty’ updated to “4.1.50.Final”, ‘netty-tcnative-boringssl-static’ updated to “2.0.31.Final”, and second argument of functions “onPingRead()” and “onPingAckRead()” in Http2Handler.java file, updated from ‘Bytebuf’ to ‘long’: Logs_With_Netty_Updated_to_4.1.50.Final.txt

odidev avatar Jan 08 '21 11:01 odidev

@praminda In MGW, I have encountered the error in logs after updating netty, which says:

2021-01-08 10:47:27 WARN  AmqpConnectionHandler:133 - Bad message received
java.lang.IllegalArgumentException: Unknown protocol name AMQP
        at io.ballerina.messaging.broker.amqp.codec.handlers.AmqpDecoder.processProtocolInitFrame(AmqpDecoder.java:105)
        at io.ballerina.messaging.broker.amqp.codec.handlers.AmqpDecoder.decode(AmqpDecoder.java:77)
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
        ...

This error message is coming from the dependency “io.ballerina.messaging.broker-launcher”. Here is the reference to the error message we have received in MGW, via ballerina-message-broker: https://github.com/wso2-attic/ballerina-message-broker/blob/master/modules/broker-amqp/src/main/java/io/ballerina/messaging/broker/amqp/codec/handlers/AmqpDecoder.java#L104.

I have checked that this dependency uses an older version of netty, and I think if we update netty in MGW, we also need to make required changes in ballerina-message-broker, to make it work with the latest netty. But the blocker is, “ballerina-message-broker” maintainers have rolled out the last release(v0.970.5) in Aug, 2018, and the repo is unmaintained after that. We can not submit issues and PRs, as mentioned in their readme.md here https://github.com/wso2-attic/ballerina-message-broker#this-repository-is-no-longer-maintained.

We clearly can not update netty here, as the latest yet old version of “ballerina-message-broker” has conflicts with the latest netty version used in MGW.

I think we may need to replace this dependency to make netty update work for MGW. Do you have any suggestions on this issue?

odidev avatar Jan 11 '21 08:01 odidev

@odidev Great job in finding the cause of the issue. Thanks. We may have to get an upgrade from ballerina-mb or move to another amqp client. @Asitha Do you have any feedback on what we can do here?

praminda avatar Jan 12 '21 03:01 praminda

Any workarounds to get the microgateway running in ARM64 ?? We are also the same above issue ! @praminda - Any leads on this would be helpful.

lokkeshjaya avatar Sep 25 '21 18:09 lokkeshjaya