dlib icon indicating copy to clipboard operation
dlib copied to clipboard

Fresh CMakeLists.txt

Open pfeatherstone opened this issue 1 year ago • 109 comments

This is an attempt at refactoring dlib's CMakeLists.txt files. The objectives are:

  • Modern cmake, so no globals, very few variables, use the target_ functions
  • Use best practices where possible (i'm not a cmake expert so please shout out if you see a better way of doing something)
  • Reduce lines of code as much as possible
  • All deps are imported using find_package()
  • Use CMake's FindCUDAToolkit and others.

pfeatherstone avatar Feb 06 '24 21:02 pfeatherstone

2 problems:

  • The implementation of BLAS in external/cblas is broken. Missing symbols
  • Cmake's findBLAS isn't detecting any blas libraries. So it's trying to use dlib's own version. Go to last point I'll have a look tomorrow.

pfeatherstone avatar Feb 06 '24 22:02 pfeatherstone

So added the missing BLAS files in external/blas. I didn't know there was a difference between cblas and blas. I have no idea how this was working before. We would have got errors like: "undefined reference to dgemm_" in file cblas_dgemm.c

Otherwise, I think the FindBLAS module in cmake is broken. Shame. It doesn't always detect perfectly valid BLAS libraries. So I'll quickly write a custom one tomorrow (steal quite a bit from dlib's old file)

Otherwise, the Python tests are failing in test_global_optimization.py It has to do with translating exceptions between c++ and Python. No idea how to attack that one and don't know what's caused that.

pfeatherstone avatar Feb 07 '24 22:02 pfeatherstone

The problem with test_global_optimization.py is DLIB_CASSERT(). The second time the assert gets hit, the program aborts. That's expected behaviour.

So I'm not sure how:

with raises(Exception):
    find_max_global(lambda a, b: 0, [0, 0, 0], [1, 1, 1], 10)
with raises(Exception):
    find_min_global(lambda a, b: 0, [0, 0, 0], [1, 1, 1], 10)
with raises(Exception):
    find_max_global(lambda a, b, c, d, *args: 0, [0, 0, 0], [1, 1, 1], 10)
with raises(Exception):
    find_min_global(lambda a, b, c, d, *args: 0, [0, 0, 0], [1, 1, 1], 10)

in test_global_optimization.py is supposed to work with DLIB_CASSERT() being used.

I've replaced a couple DLIB_CASSERT() statements with if(!(condition)) throw std::invalid_argument(msg);

pfeatherstone avatar Feb 08 '24 08:02 pfeatherstone

I still need to add building examples and tools in main CMakelists.txt script. Then everything is built in one place. I think that's better.

There are a few compiler specific options which haven't been included. I don't know if those are still needed anymore. The days of using gcc 4 are gone for example.

I need to enable all warnings on unit tests in a nice modern cmake way. I don't like all these compiler specific clauses. There must be a better way.

Need to fix the matlab script and include that in main CMakelists.txt script.

Oh, and write custom FindBLAS module as the built-in cmake one is dodgy.

I also think the setup.py script could be cleaned up. But that can be another PR.

pfeatherstone avatar Feb 08 '24 11:02 pfeatherstone

So added the missing BLAS files in external/blas. I didn't know there was a difference between cblas and blas. I have no idea how this was working before. We would have got errors like: "undefined reference to dgemm_" in file cblas_dgemm.c

Otherwise, I think the FindBLAS module in cmake is broken. Shame. It doesn't always detect perfectly valid BLAS libraries. So I'll quickly write a custom one tomorrow (steal quite a bit from dlib's old file)

external/cblas is used only when linking into matlab. Matlab has a blas in it that works with the blas in external/cblas. Otherwise external/cblas should never be used. It isn't broken though.

davisking avatar Feb 08 '24 13:02 davisking

The problem with test_global_optimization.py is DLIB_CASSERT(). The second time the assert gets hit, the program aborts. That's expected behaviour.

So I'm not sure how:

with raises(Exception):
    find_max_global(lambda a, b: 0, [0, 0, 0], [1, 1, 1], 10)
with raises(Exception):
    find_min_global(lambda a, b: 0, [0, 0, 0], [1, 1, 1], 10)
with raises(Exception):
    find_max_global(lambda a, b, c, d, *args: 0, [0, 0, 0], [1, 1, 1], 10)
with raises(Exception):
    find_min_global(lambda a, b, c, d, *args: 0, [0, 0, 0], [1, 1, 1], 10)

in test_global_optimization.py is supposed to work with DLIB_CASSERT() being used.

I've replaced a couple DLIB_CASSERT() statements with if(!(condition)) throw std::invalid_argument(msg);

Keep using DLIB_CASSERT. There is a build rule for python that makes DLIB_CASSERT not abort on the second failure by setting -DDLIB_NO_ABORT_ON_2ND_FATAL_ERROR

davisking avatar Feb 08 '24 13:02 davisking

So added the missing BLAS files in external/blas. I didn't know there was a difference between cblas and blas. I have no idea how this was working before. We would have got errors like: "undefined reference to dgemm_" in file cblas_dgemm.c Otherwise, I think the FindBLAS module in cmake is broken. Shame. It doesn't always detect perfectly valid BLAS libraries. So I'll quickly write a custom one tomorrow (steal quite a bit from dlib's old file)

external/cblas is used only when linking into matlab. Matlab has a blas in it that works with the blas in external/cblas. Otherwise external/cblas should never be used. It isn't broken though.

Sorry, it isn't broken. I thought it was initially, then i found out it was simply missing the base BLAS routines like dgemm_ etc, which functions like cblas_dgemm call under the hood. I thought external/cblas was meant to be used on any system which didn't have a valid blas library.

pfeatherstone avatar Feb 08 '24 13:02 pfeatherstone

So added the missing BLAS files in external/blas. I didn't know there was a difference between cblas and blas. I have no idea how this was working before. We would have got errors like: "undefined reference to dgemm_" in file cblas_dgemm.c Otherwise, I think the FindBLAS module in cmake is broken. Shame. It doesn't always detect perfectly valid BLAS libraries. So I'll quickly write a custom one tomorrow (steal quite a bit from dlib's old file)

external/cblas is used only when linking into matlab. Matlab has a blas in it that works with the blas in external/cblas. Otherwise external/cblas should never be used. It isn't broken though.

Sorry, it isn't broken. I thought it was initially, then i found out it was simply missing the base BLAS routines like dgemm_ etc, which functions like cblas_dgemm call under the hood. I thought external/cblas was meant to be used on any system which didn't have a valid blas library.

Na it's just a matlab shim.

davisking avatar Feb 08 '24 13:02 davisking

Na it's just a matlab shim.

Ok, I'll fix the cmake script later which only pulls in external/cblas when building matlab.

pfeatherstone avatar Feb 08 '24 13:02 pfeatherstone

Cmake has the FindMatlab module which looks good and the handy matlab_add_mex cmake function. Might try that

pfeatherstone avatar Feb 08 '24 13:02 pfeatherstone

Nearly there, just need to fix FindBLAS and matlab then done I think.

pfeatherstone avatar Feb 09 '24 13:02 pfeatherstone

The Matlab build is so close. I think the problem now is RPATH is stripped during install and the example mex files can't find libmex, libmx and libeng anymore. Nearly there. Otherwise I'm pretty happy with everything. I think more cleanup could be done with generator expressions. But fine for now.

EDIT: Even though RPATH is stripped, that wasn't the problem. Problem now is mismatching glibc. Apparently this is a problem with Matlab. Don't understand what's different to how dlib normally builds the MEX wrappers. I guess it's the cmake helper macro

pfeatherstone avatar Feb 12 '24 22:02 pfeatherstone

Oh, wow! Well done! All checks passed now!

arrufat avatar Feb 14 '24 01:02 arrufat

Oh yeah sick. I'll try it out 😁

davisking avatar Feb 14 '24 01:02 davisking

Tried it on an ubuntu 18.04 machine with cmake version 3.29.0-rc1 and got an error though trying to build the tests.

davis@plonk:~/source/dlib/build$ make
[  1%] Building CUDA object dlib/CMakeFiles/dlib.dir/cuda/cuda_dlib.cu.o
nvcc fatal   : 'avx2': expected a number

cuda/nvcc is always the boss at the end of any build tooling :|

You putting the CMakeLists.txt back in the individual folders after figuring out how to build it all I'm guessing?

davisking avatar Feb 14 '24 03:02 davisking

Shall we add a CI runner for Ubuntu18 ?

I was thinking of building everything from 1 CMakeLists.txt file. I much prefer it personally. I think it's tidier. There is also less duplication I find.

pfeatherstone avatar Feb 14 '24 07:02 pfeatherstone

What version of the CUDA toolkit and c++ compiler were you using? I'll build a VM tonight and try reproduce.

pfeatherstone avatar Feb 14 '24 07:02 pfeatherstone

So when building the Matlab MEX stuff I found I didn't need any cblas code at all. That was with latest Matlab though. But I imagine it will be the case from R2018 onward or something. It seems like that was a major new release at the time. So maybe we can remove all external/cblas completely.

While going through areas of the repo, I found there were a few things that needed a refresh. There is some old school c++98 style meta programming here and there. I'll change that in another PR.

pfeatherstone avatar Feb 14 '24 07:02 pfeatherstone

Shall we add a CI runner for Ubuntu18 ?

I was thinking of building everything from 1 CMakeLists.txt file. I much prefer it personally. I think it's tidier. There is also less duplication I find.

Ubuntu 18 was EOL last year, should we still support it?

arrufat avatar Feb 14 '24 09:02 arrufat

I think this is more a problem with gcc, NVCC and cmake versions. If I can get those from @davisking i can reproduce on my machine.

It could just be that some compiler flags must only be passed to C/c++ files, not .cu files. Don't know but I'm pretty sure this is easily fixed.

pfeatherstone avatar Feb 14 '24 12:02 pfeatherstone

Shall we add a CI runner for Ubuntu18 ?

Na, it's depreciated. I need to try it on a newer machine and see what happens. Did you build the cuda stuff locally? I just assumed it was not working for things that used cuda.

I was thinking of building everything from 1 CMakeLists.txt file. I much prefer it personally. I think it's tidier. There is also less duplication I find.

Totally keep the separate CMakeLists.txt files. People have to be able to use dlib without editing any files in the dlib/ folder. And the CMakeLists.txt are examples of how to do that. We should do things the way others would as well. Good libraries are just libraries on top of libraries :) So it's important that it be real clean and easy to make some new application with its own CMakeLists.txt that depends on dlib. It also keeps the concerns clearly separated.

davisking avatar Feb 14 '24 13:02 davisking

What version of the CUDA toolkit and c++ compiler were you using? I'll build a VM tonight and try reproduce.

cuda 11.0 and g++ 7.5.0

davisking avatar Feb 14 '24 13:02 davisking

So when building the Matlab MEX stuff I found I didn't need any cblas code at all. That was with latest Matlab though. But I imagine it will be the case from R2018 onward or something. It seems like that was a major new release at the time. So maybe we can remove all external/cblas completely.

It's not super clear to me from the logs that the matlab mex stuff used blas. Like maybe dlib was just not compiled to use BLAS or LAPACK at all. I'm kinda surprised the cblas shim isn't needed and that would explain why it wasn't needed if that's what happened. There needs to be a bit more logging in the cmake files. Like the stuff that prints the dlib version and compiler version is important. Same for some of the logging statements that clearly indicate if dlib is using a blas library or not and things like that.

While going through areas of the repo, I found there were a few things that needed a refresh. There is some old school c++98 style meta programming here and there. I'll change that in another PR.

Yeah definitely in other PRs.

davisking avatar Feb 14 '24 13:02 davisking

Shall we add a CI runner for Ubuntu18 ? I was thinking of building everything from 1 CMakeLists.txt file. I much prefer it personally. I think it's tidier. There is also less duplication I find.

Ubuntu 18 was EOL last year, should we still support it?

Maybe, but I don't want to unnecessarily break things though.

davisking avatar Feb 14 '24 13:02 davisking

I've been building with CUDA on multiple machines no problem, using CUDA versions 11.8 and 12.3. Looking online, CUDA 11.0 has the issue you were describing.

The Matlab stuff definitely had BLAS enabled. If you check the symbols of the compiled libdlib library you will see cblas_dgemm, dgemm_ and co. You won't see them in the mex wrappers since all symbols apart from MEX specific stuff are hidden.

I'll separate the CMakeLists.txt file again. I personally prefer everything in one place but so be it.

pfeatherstone avatar Feb 14 '24 14:02 pfeatherstone

I can add a few more logs as well.

pfeatherstone avatar Feb 14 '24 14:02 pfeatherstone

I think the newer Matlab libraries starting with libmw have all necessary BLAS and LAPACK symbols now. Maybe you had to do it previously because of an old Matlab installation

pfeatherstone avatar Feb 14 '24 14:02 pfeatherstone

I can try a really old one if you want but I doubt anyone out there is using something older than R2018

pfeatherstone avatar Feb 14 '24 14:02 pfeatherstone

@davisking shall we use https://github.com/marketplace/actions/cuda-toolkit to enable CUDA in one of the runners?

pfeatherstone avatar Feb 14 '24 21:02 pfeatherstone

So i've split CMakelists.txt back into all the separate folders. I'm still not a fan, but i'm not dlib's benevolent dictator for life :) Most libraries i've seen have XXX_BUILD_TESTS, XXX_BUILD_PYTHON, etc options in a single CMakeLists.txt file so everything can be built from one place. But hey ho.

I'll get on with adding more print statements.

pfeatherstone avatar Feb 14 '24 22:02 pfeatherstone