noobaa-core icon indicating copy to clipboard operation
noobaa-core copied to clipboard

Cannot build nooba core in power (ppc64le) architecture

Open YiannisGkoufas opened this issue 5 years ago • 10 comments

Environment info

  • NooBaa Version: 5.3.0
  • Architecture: ppc64le

Actual behavior

  1. My goal is to build nooba core for power. I first did the necessary changes on the dockerfiles to use the corresponding binary files and docker images. Currently it fails when running npm run build:native. I think it has to do with the sse2 flags on the GF256 library. In this repo https://github.com/YiannisGkoufas/noobaa-core is the ongoing effort to build nooba core for power. I used some wrappers from LLVM but I am stuck.

Expected behavior

  1. To successfully build nooba core

Steps to reproduce

  1. Clone the repo in a power machine
  2. Build it from source

More information - Screenshots / Logs / Other output

errors.txt

YiannisGkoufas avatar Feb 03 '20 14:02 YiannisGkoufas

Hi @YiannisGkoufas

Thank you for all that effort!

Unfortunately, we don't have a PPC machine for trying that, and we also don't have the PPC expertise to understand the errors you shared.

Few suggestions:

  1. A temporary way forward can be to build without cm256, and maybe also without intel's ISA-L would be difficult to build on PPC (?). This means that Erasure Coding functionality will not be included, but the rest of the functionality should not be affected. Try it by commenting out the following line and also the ISA-L line after it and try to rebuild: https://github.com/noobaa/noobaa-core/blob/f8eb884322ddc17848eaaed1cf4c768c7620d9bc/src/native/nb_native.gyp#L36

  2. You should focus on compiling cm256 standalone from https://github.com/catid/cm256 and run the unit tests on PPC. If that works for you the next step would be to contribute the needed changes to the cm256 project itself and we will pull it to noobaa-core.

Let me know if I can help.

guymguym avatar Feb 04 '20 09:02 guymguym

Hi @guymguym

apologies that I didn't get back to you earlier, was just trying to get it work. At this point I have managed to build the images for power and have the deployment working on the openshift 3.11 (no 4.x for power yet) cluster that we have. Besides disabling the import of gyp modules and some dockerfile changes, I had to comment out the code-erasure calls in src/native/chunk/coder.h src/native/chunk/coder_napi.cpp src/native/chunk/coder.cpp (check the updated version in https://github.com/YiannisGkoufas/noobaa-core for details)

Ι did a small test:

noobaa backingstore create pv-pool bstest --num-volumes 3 --pv-size-gb=1 --storage-class managed-nfs-storage
noobaa bucketclass create testbc --backingstores=bstest
noobaa obc create testbucket --app-namespace noobaa-tests --bucketclass testbc

and it's stuck on the last step, if I check the status I can see

[johngouf@oc4204838823 hybrid-storage]$ ./noobaa obc status testbucket
INFO[0000] ✅ Exists: ObjectBucketClaim "testbucket"     
INFO[0000] ❌ Not Found: ObjectBucket "obc-noobaa-tests-testbucket" 
INFO[0000] ❌ Not Found: ConfigMap "testbucket"          
INFO[0000] ❌ Not Found: Secret "testbucket"             
INFO[0000] ✅ Exists: StorageClass "noobaa-tests.noobaa.io" 
INFO[0000] ✅ Exists: BucketClass "testbc"               

ObjectBucketClaim info:
  Phase                  : Pending
  ObjectBucketClaim      : kubectl get -n noobaa-tests objectbucketclaim testbucket
  ConfigMap              : kubectl get -n noobaa-tests configmap testbucket
  Secret                 : kubectl get -n noobaa-tests secret testbucket
  ObjectBucket           : kubectl get objectbucket obc-noobaa-tests-testbucket
  StorageClass           : kubectl get storageclass noobaa-tests.noobaa.io
  BucketClass            : kubectl get -n noobaa-tests bucketclass testbc

Connection info:

Shell commands:
  AWS S3 Alias           : alias s3='aws s3 --no-verify-ssl --endpoint-url https://:'

Any hints about what error I should be looking for? The only error that I can see in the logs of the core pod is:

Mar-6 11:30:03.420 [WebServer/47]    [L0] core.rpc.rpc_n2n:: _close undefined
Mar-6 11:30:03.420 [WebServer/47]    [L0] core.rpc.rpc_base_conn:: RPC CONN CLOSE ON ERROR n2n://5e5fc0f6aba9140025fa90a9(1jx9397e8r) [Error: N2N ICE CLOSED]
Mar-6 11:30:03.421 [HostedAgents/32]    [L0] core.rpc.rpc_base_conn:: RPC CONN CLOSE ON ERROR n2n://nodes_monitor(1jx940vng6) [Error: N2N ICE CLOSED]
Mar-6 11:30:03.421 [HostedAgents/32]    [L0] core.rpc.rpc_n2n:: _close undefined
Mar-6 11:30:03.421 [HostedAgents/32]    [L0] core.rpc.ice:: TLS CLOSED: tcp://[10.131.0.133]:60100=>tcp://[::ffff:10.131.0.133]:55474

which I am not sure if its relevant or not. Thanks for your help!

YiannisGkoufas avatar Mar 06 '20 11:03 YiannisGkoufas

Hey @YiannisGkoufas, You should look for errors in the noobaa-operator logs.

guymguym avatar Mar 06 '20 20:03 guymguym

Thanks @guymguym I am suspecting this happens because openshift 3.11 is based on an old kubernetes version (v1.11.x I think) Will try to test it in a 1.15+ k8s and see if there is any change

YiannisGkoufas avatar Mar 12 '20 09:03 YiannisGkoufas

@YiannisGkoufas We support OCP 3.11 so not sure this is the reason...

guymguym avatar Mar 12 '20 10:03 guymguym

Oh okay, will check the operator logs, thanks for letting me know!

YiannisGkoufas avatar Mar 12 '20 10:03 YiannisGkoufas

Hi @YiannisGkoufas Can you please rebase from master and try to build now? #6082 and #6097 should bring support to ppc64le and s390x.

let me know how it went.

liranmauda avatar Jul 28 '20 07:07 liranmauda

Hi @YiannisGkoufas did you get a chance to rebase and retry?

nimrod-becker avatar Aug 10 '20 10:08 nimrod-becker

Closing as we are building this now on ppc64le

liranmauda avatar Aug 11 '20 12:08 liranmauda

I tried to compile NooBaa core with npm run build:native on power architecture (more info about my env below) and I faced the following errors:

First error:

../src/native/chunk/splitter.cpp: In constructor 'noobaa::Splitter::Splitter(int, int, int, bool, bool)':
../src/native/chunk/splitter.cpp:47:27: error: ignoring return value of 'int posix_memalign(void**, size_t, size_t)', declared with attribute warn_unused_result [-Werror=unused-result]
             posix_memalign((void**)&_md5_mb_mgr, 16, sizeof(MD5_HASH_CTX_MGR));
             ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/native/chunk/splitter.cpp:48:27: error: ignoring return value of 'int posix_memalign(void**, size_t, size_t)', declared with attribute warn_unused_result [-Werror=unused-result]
             posix_memalign((void**)&_md5_mb_ctx, 16, sizeof(MD5_HASH_CTX));
             ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This can be fixed with the following changes:

diff --git a/src/native/chunk/splitter.cpp b/src/native/chunk/splitter.cpp
index d06b28ef5..77db1dc01 100644
--- a/src/native/chunk/splitter.cpp
+++ b/src/native/chunk/splitter.cpp
@@ -44,8 +44,8 @@ Splitter::Splitter(
     if (_calc_md5) {
         extern bool fips_mode;
         if (fips_mode) {
-            posix_memalign((void**)&_md5_mb_mgr, 16, sizeof(MD5_HASH_CTX_MGR));
-            posix_memalign((void**)&_md5_mb_ctx, 16, sizeof(MD5_HASH_CTX));
+            if (posix_memalign((void**)&_md5_mb_mgr, 16, sizeof(MD5_HASH_CTX_MGR)) > 0){};
+            if (posix_memalign((void**)&_md5_mb_ctx, 16, sizeof(MD5_HASH_CTX)) > 0){};
             md5_ctx_mgr_init(_md5_mb_mgr);
             hash_ctx_init(_md5_mb_ctx);
             md5_mb_submit_and_flush(0, 0, HASH_FIRST);

Second error (occurs after the above patch is in place):

In file included from /usr/include/fcntl.h:290:0,
                 from /root/.cache/node-gyp/14.16.1/include/node/uv/unix.h:27,
                 from /root/.cache/node-gyp/14.16.1/include/node/uv.h:66,
                 from ../src/native/fs/fs_napi.cpp:6:
In function 'int open(const char*, int, ...)',
    inlined from 'virtual void noobaa::Writefile::Work()' at ../src/native/fs/fs_napi.cpp:329:56:
/usr/include/powerpc64le-linux-gnu/bits/fcntl2.h:50:24: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
    __open_missing_mode ();
    ~~~~~~~~~~~~~~~~~~~~^~

This can be fixed with the following changes:

--- a/src/native/fs/fs_napi.cpp
+++ b/src/native/fs/fs_napi.cpp
@@ -326,7 +326,7 @@ struct Writefile : public FSWorker
     }
     virtual void Work()
     {
-        int fd = open(_path.c_str(), O_WRONLY | O_CREAT);
+        int fd = open(_path.c_str(), O_WRONLY | O_CREAT, 0755);
         if (fd < 0) {
             SetSyscallError();
             return;
@@ -889,4 +889,4 @@ fs_napi(Napi::Env env, Napi::Object exports)
     exports["fs"] = exports_fs;
 }

-}
\ No newline at end of file
+}

With these two patches the npm run build:native completes successfully on ppc64le and I can confirm that aftewards, I am able to run every component such as npm run web, npm run bg, npm run hosted_agents, and npm run s3.

I would like to know if these errors are somehow limited only to my env or they are present in general for ppc64le. In the case of the latter and if you agree with these changes, I would be more than happy to make a Pull Request for them.

Environment Info

  • NooBaa version: master at commit (https://github.com/noobaa/noobaa-core/commit/d639630781307daa2d553374072412674938d219)
  • Architecture: ppc64le
  • Distro: Ubuntu 18.04.5 LTS
  • GCC toolchain version (gcc/g++): 7.5.0
  • Node version: 14.16.1
  • npm version: 6.14.12

pkoutsov avatar May 07 '21 13:05 pkoutsov

we are building ppc, closing this

nimrod-becker avatar Apr 17 '23 16:04 nimrod-becker