openfast icon indicating copy to clipboard operation
openfast copied to clipboard

Use Cmake OBJECT libraries to create openfastlib and add option to use downloaded reference lapack and blas sources

Open reos-rcrozier opened this issue 3 years ago • 8 comments

Feature or improvement description

Updates to the build system to facilitate building on Linux, particularly with Simulink support. OpenFAST cmake files have been modified to properly make use of the cmake OBJECT library type to create the openfast library. In addition, the option to build a local version of the official LAPACK and BLAS sources has been added through the use of the Cmake command ExternalProject_Add. A new boolean variable has been added, USE_LOCAL_STATIC_LAPACK. If set to ON, the latest LAPACK and BLAS soures are downloaded and build locally. The install command always installs these locally build libraries in ${CMAKE_SOURCE_DIR}/install/lib, i.e. regardless of the value of CMAKE_INSTALL_PREFIX to avoid user's accidentally overwriting system LAPACK and BLAS libraries. The purpose of this option is primarily to provide static LAPACK libraries to link to the FAST_Sfunc mex file, to avoid load time issues with shared libraries, but it will also allow testing with the latest LAPACK.

Related issue, if one exists

#924, #556, #482, #926, probably others

Impacted areas of the software

Only cmake files and create_FAST_SFunc.m

Additional supporting information

When building on Linux with Simulink support, there is an issue at load time with libraries being shipped with Matlab incorrectly being loaded when loading openfast as a shared library. These include libraries such as lapack and blas. A solution might be to build the openfast library as a static library, however, the current method of doing that on Linux is broken because the openfast library is created by linking together a bunch of other libraries created during the build. This is ok for shared libraries, but is incorrect on linux for static libraries. On Linux, the linker when creating a static library is ar. ar is just an archiver and doesn't really do a lot more than take a bunch of object files and stuff them in the static library (at least by default with Cmake). ar cannot link an existing library to the static library, one would have to extract the object files from the static library and then use ar to add them. With the current build system, when you try to link to the created static library on Linux, you get undefined references, because ar hasn't actually linked any of the sub-libraries like aerodyn etc.

Cmake have provided a mechanism to deal with this, which is OBJECT libraries. OBJECT libraries allow you to break up libraries into smaller modules as in OpenFAST, but link them correctly at the higher level. With this PR we have modified OpenFAST to use object libraries throughout.

This solves the issue with undefined references due to the missing OpenFAST modules, but, it also means that when later linking to libopenfastlib.a one must remember to link to the gfortran, quadmath, dl, lapack and blas libraries to avoid undefined references from these libraries (which are linked by default if building a shared library). This presents a further problem when linking a mex file on Linux as the Matlab supplied LAPACK and BLAS shared libraries are not the same version as the versions which are linked by default by the mex command. For shared libraries the problem is not apparent at link time, but appears at load time when differences in the interfaces cause errors when running the FAST_SFunc command. The solution is to link to static LAPACK and BLAS libraries when creating the mex file. For this reason the USE_LOCAL_STATIC_LAPACK options described above was created. This builds the LAPACK and BLAS libraries locally, and installs them in a place which is easy for the mex linker to find, in the OpenFAST source tree.

Some basic modifications to create_FAST_SFunc.m to make use of these modification have been made. On non-windows platforms it links to gfortran, lapack, blas, quadmath and dt by default to avoid undefined references. The default install directory is also modified to ../../install as on windows, mainly to allow easy linking to static LAPACK installed in these directories. Therefore on linux, an appropriate cmake command is something like:

cmake -LH /home/username/source/path/openfast -DBUILD_FASTFARM=ON -DBUILD_OPENFAST_CPP_API=ON -DBUILD_OPENFAST_SIMULINK_API=ON -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/home/username/source/path/openfast/install -DDOUBLE_PRECISION=ON -DOPENMP=OFF -DUSE_LOCAL_STATIC_LAPACK=ON

reos-rcrozier avatar Feb 22 '22 10:02 reos-rcrozier

FYI a backport of these modifications to OpenFAST v3.0.0 is available on our branch here

reos-rcrozier avatar Feb 22 '22 10:02 reos-rcrozier

looking at the output of the runs, it seems we might have missed specifying a target dependency for versioninfolib_obj. Probably being picked up because the order of compilation is slightly different than on our test machine.

reos-rcrozier avatar Mar 01 '22 20:03 reos-rcrozier

@reos-rcrozier Thanks for this pull request and my apologies that it has sat for a few months. I think I understand the motivation and how the code changes solve the problem. Have you worked through the issue with the version lib?

rafmudaf avatar May 03 '22 20:05 rafmudaf

@reos-rcrozier Thanks for this pull request and my apologies that it has sat for a few months. I think I understand the motivation and how the code changes solve the problem. Have you worked through the issue with the version lib?

No I haven't, mainly because I can't replicate it, and am very squeezed for time. It could be weeks before I have a chance to look into it properly, unless the same issue crops up when we go to build on a new workstation we have.

reos-rcrozier avatar May 03 '22 23:05 reos-rcrozier

@rafmudaf , I was able to replicate the issue on a parallel build. It seems that there can be an issue with cmake if a fortran source file is built more than once. I have modified the cmake files to ensure this was the case.

I also made a modification to reduce problems which occur when building with cmake option BUILD_OPENFAST_SIMULINK_API=ON. The mere act of linking to the mex libraries can cause the standard libraries shipped with matlab to be linked to throughout the project. Since the nwtcs library is linked to the mex libraries, and is used throughout the project the whole build can be polluted in this way, resulting in failures due to mismated library versions. I threrfore made changes to minimize as much as possible the places where the mex libraries are linked. In fat now two openfast libraries are built, openfastlib and openfastlib_mex, the latter being intended to be linked to the simulink interface.

Everything compiles now, including the simulink S-Function, however, I've not actually tested running anything yet.

reos-rcrozier avatar May 19 '22 23:05 reos-rcrozier

@rafmudaf Could you now approve the testing workflow please?

I have now tested that everything seems to be working, by doing some basic simulations, I've not run the test suite. I also fixed an issue building the openfastcpp lib due to missing OpenMP compiler flags.

reos-rcrozier avatar May 23 '22 14:05 reos-rcrozier

@rafmudaf Could you now approve the testing workflow please?

@reos-rcrozier done and its running

rafmudaf avatar May 23 '22 15:05 rafmudaf

ok, 6 out of 7 test now passing, just the interface-tests. What's different about this test? The error seems to be about multiply defined functions in the sys file, but why is it only showing up in this test?

reos-rcrozier avatar May 23 '22 22:05 reos-rcrozier

While using OpenFAST 3.1.0, this pull request was a lifesaver for us. But we now upgraded to OpenFAST 3.3.0, where we recreated the changes from this pull request. In case anybody is interested you can find it here.

AaronTUB avatar Jan 05 '23 08:01 AaronTUB

It seems this is useful, can we trigger a workflow please, I haven't tested locally yet.

reos-rcrozier avatar Jan 12 '23 12:01 reos-rcrozier

@reos-rcrozier I've enabled the automated tests. Please @ me directly if you need a maintainer to do anything via GitHub again.

rafmudaf avatar Jan 12 '23 15:01 rafmudaf

Bah, I'll need to debug offline

reos-rcrozier avatar Jan 12 '23 15:01 reos-rcrozier

@rafmudaf I have enabled actions in my fork and with recent changes to the branch all tests are passing except for rtest-interfaces. You can see the test results here.

The failed tests seem to be timeouts in some of the tests, I don't really know how severe this is, will it prevent merging?

reos-rcrozier avatar Jan 19 '23 16:01 reos-rcrozier

two checks are failing but one is after 6 seconds due to:

Error: No file in /home/runner/work/openfast/openfast matched to [**/requirements.txt], make sure you have checked out the target repository

other is due to timeouts.

reos-rcrozier avatar Jan 19 '23 23:01 reos-rcrozier

@reos-rcrozier, that is a strange error. We've had some issues with GitHub actions occasionally failing. Rerunning the failed jobs often will fix it, so I'm doing that now.

andrew-platt avatar Jan 19 '23 23:01 andrew-platt

After rerunning tests multiple times, there are some consistent failures in the python OF test cases. Example from recent run:

32: Traceback (most recent call last): 32: File "/home/runner/work/openfast/openfast/reg_tests/executePythonRegressionCase.py", line 137, in <module> 32: openfastlib = openfast_library.FastLibAPI(openfastlib_path, caseInputFile) 32: File "/home/runner/work/openfast/openfast/reg_tests/../glue-codes/python/openfast_library.py", line 25, in __init__ 32: super().__init__(library_path) 32: File "/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/ctypes/__init__.py", line 374, in __init__ 32: self._handle = _dlopen(self._name, mode) 32: OSError: /home/runner/work/openfast/openfast/build/reg_tests/glue-codes/python/../../../modules/openfast-library/libopenfastlib.so: cannot open shared object file: No such file or directory

I can't tell yet if this is due to something in GH, or due to something in the changes (I haven't looked in detail yet).

There are also some consistent failures in the module regression tests that are not making much sense to me. I'm rerunning the entire set to see if this clears.

andrew-platt avatar Jan 20 '23 22:01 andrew-platt

could be a real issue, I will check if libopenfastlib.so is still actually created

reos-rcrozier avatar Jan 20 '23 23:01 reos-rcrozier

The failing test was due to my omitting to make openfastlib always be shared, as the python interface requires this (it uses dlopen). Now that is corrected all tests are passing. I think the PR is now ready for merge. It's worth noting it might have an impact on any branches under development which are for new modules not present in these sources. They might also need to be modified to use object libraries.

reos-rcrozier avatar Jan 25 '23 15:01 reos-rcrozier

@rafmudaf @andrew-platt any chance to consider the merge? Sorry to push, but it would be good to complete this while it's fresh in my mind.

reos-rcrozier avatar Feb 02 '23 13:02 reos-rcrozier

@rafmudaf @andrew-platt sorry, but pinging again to see if this can now be considered for merge?

reos-rcrozier avatar Feb 28 '23 21:02 reos-rcrozier

@reos-rcrozier, my apologies for the delay in getting to this. I have a few concerns about what impacts this might have on some of the systems we deploy on, but have not had a chance to test your branch or do a thorough review yet.

I would like to get this into our v3.5.0 release that we plan to release before the end of the month, so we will be looking at this within the next week or two.

@deslaughter, is there anything you wanted to add to the discussion at present?

andrew-platt avatar Mar 02 '23 23:03 andrew-platt

@andrew-platt we have experience cross-compiling to many platforms, so if you need testing perhaps we can help.

For example I have previously cross-compiled FAST for PowerPC and (i think, it was some time ago) arm, although some tweaks were needed to FORTRAN code for powerPC.

reos-rcrozier avatar Mar 03 '23 10:03 reos-rcrozier

Hi all. I'm not sure if this will be helpful to anyone, but to get a working MEX file on my system (Ubuntu 22.04) with USE_LOCAL_STATIC_LAPACK ON, I had to explicitly add a lot of sources to the simulink CMakeLists.txt (NWTC, PDAS, NetLib SLATEC, Supercontroller, RanLux, Openfoam). Otherwise I would get errors like: "undefined symbol: __openfoam_types_MOD_opfm_destroyoutput" when calling FAST_SFunc in Matlab.

henyau avatar Jul 05 '23 13:07 henyau

@henyau Did you do a static or shared build of OpenFAST? Doing a static build (-DBUILD_SHARED_LIBS=OFF as in the example at the top of this thread) should mean you don't have any undefined references.

reos-rcrozier avatar Jul 11 '23 09:07 reos-rcrozier

@henyau I'm also having a lot of issues with this combination of flags on Linux. It seems to work fine on my Mac. If you have the chance, could you try -DBLA_STATIC=ON instead of -DUSE_LOCAL_STATIC_LAPACK=ON? This option will find and use a static BLAS/LAPACK from your system instead of building from source. BLA_STATIC works with the Intel MKL library and has been tested on Windows and Linux.

deslaughter avatar Jul 11 '23 16:07 deslaughter

@reos-rcrozier: Yes, was doing a static build (-DBUILD_SHARED_LIBS=OFF). @deslaughter: I just tried it with -DBLA_STATIC=ON (Cmake finds and uses OpenBLAS). It compiles and links fine, but I get the same undefined symbol error as before when trying to run the Simulink model. Using Intel MKL would have been my next step if I hadn't gotten it to work.

henyau avatar Jul 11 '23 17:07 henyau

@henyau ok, can I first confirm you using create_FAST_SFunc.m to create the mex file? If so, you could also try adding some more -l options to explicitly link to the libraries there which might also solve the problem. Would also be useful to confirm the compiler suite and version you were are using please.

reos-rcrozier avatar Jul 11 '23 23:07 reos-rcrozier

Actually, I didn't realize until now that @deslaughter removed most of the changes introduced in this PR. So I probably can't help much.

reos-rcrozier avatar Jul 11 '23 23:07 reos-rcrozier

@deslaughter A bit frustrating that all this work we did, specifically to make simulink mex functions work properly on lInux was simply removed without tagging me to see if we could resolve the mac issue. I don't think removing the object libs was the right way to go.

reos-rcrozier avatar Jul 11 '23 23:07 reos-rcrozier

I just tried it with -DBLA_STATIC=ON (Cmake finds and uses OpenBLAS). It compiles and links fine, but I get the same undefined symbol error as before when trying to run the Simulink model. Using Intel MKL would have been my next step if I hadn't gotten it to work.

@henyau I was able to reproduce the issue on Linux and tracked it down to the order that the libraries are linked in glue-codes/simulink/CMakeLists.txt. Could you try the following order and let me know if it resolves the issue? If not, please include the full error message from Matlab.

  $<TARGET_FILE:openfast_prelib>
  $<TARGET_FILE:aerodyn14lib>
  $<TARGET_FILE:aerodynlib>
  $<TARGET_FILE:beamdynlib>
  $<TARGET_FILE:elastodynlib>
  $<TARGET_FILE:extptfm_mckflib>
  $<TARGET_FILE:feamlib>
  $<TARGET_FILE:hydrodynlib>
  $<TARGET_FILE:icedynlib>
  $<TARGET_FILE:icefloelib>
  $<TARGET_FILE:maplib>
  $<TARGET_FILE:moordynlib>
  $<TARGET_FILE:orcaflexlib>
  $<TARGET_FILE:sctypeslib>
  $<TARGET_FILE:servodynlib-matlab>
  $<TARGET_FILE:subdynlib>
  $<TARGET_FILE:foamfastlib>
  $<TARGET_FILE:ifwlib>
  $<TARGET_FILE:scfastlib>
  $<TARGET_FILE:foamtypeslib>
  $<TARGET_FILE:versioninfolib>
  $<TARGET_FILE:nwtclibs-matlab>

deslaughter avatar Jul 12 '23 13:07 deslaughter