kaldi icon indicating copy to clipboard operation
kaldi copied to clipboard

Build system

Open danpovey opened this issue 6 years ago • 76 comments
trafficstars

For the new version of Kaldi, does anyone think we should switch to a different build system, such as cmake? We should probably still have manually-run scripts that check the dependencies; I am just wondering whether the stuff we are doing in the 'configure' script would be better done with cmake, and if so, whether anyone is interested in making a prototype to at least let it compile on Linux.

danpovey avatar Mar 10 '19 19:03 danpovey

I am not a big fan of CMake. @jtrmal is should have a more informed opinion, as he spend some time playing with it. To me, it looks more cryptic than gmake.

If we are open to any options at this point, we may consider Bazel. It is certainly different from other build systems, but it is very good at what it does.

Also, guys, let's please re-consider transitioning to Google Test seriously. E. g., when rewriting the logging code, I had to eyeball log output vs having progam expected output tests do it for me, or missed their so called "death test" (like you throw an unhandled exception, and the test succeeds when the SoT dies--it forks a copy of itself that is heading to seppuku internally, so zero set up for that, too). And it even comes in a single-header-only variant, and is so mature that it's updated once in a few years, and super-portable, even to Windows.

kkm000 avatar Mar 11 '19 03:03 kkm000

Let me clarify on CMake. On Windows, it is doing a super-lame thing. It generates project and solution files from some templates, as if someone did not really put any effort into understanding what is in these files. It just stuffs filenames into templates that were apparently treated as blackbox. I do not want to disparage anyone and call them out for not applying any effort into making CMake a decent system on all platforms; the task is indeed daunting. But, in the end, it supports over 9000 platforms... more or less. When you have one project, it's OK. When you have so many files and libraries as Kaldi does, and want to change just one little aspect of it (e. g. suppress one warning), you are either up to editing 50 project files or regenerating everything from the grounds up. In make, I just add CXXFLAGS=, and Bob's yer uncle. It's not what I would rate as best feature of a build system. This it's inherent property of being a "meta-build" system, not a build system sensu stricto, that I dislike.

kkm000 avatar Mar 11 '19 04:03 kkm000

I find cmake better than plain make, mostly for the support how to include dependencies. However, it's also a bit of a pain to wrap your head around it. I've heard good things about bazel, in particular regarding build servers and incremental build (both on server and client).

sikoried avatar Mar 11 '19 10:03 sikoried

What I like about cmake because it's fairly expressive (in the sense that even if you are not familiar with it, you are able to figure out what most of the commands do). It has a good dependency tracking system (probably make depends wouldn't be needed anymore). It fairly widely accepted (people know it) and is still being developed and supported. I agree it does weird things when generating build files for Visual Studio. But it seems MS has the intention to support CMake in the VS. At least I've heard something for VS2017, not sure if it happened or if they dropped it or what -- @kkm would probably know. Big project do use cmake -- I mean not all of them, but at least some of them -- I know about KDE and LLVM/Clang. Clang was very painless to build for me (compared to gnu compilers). That might not be because of Cmake/Make though I think Kaldi would be very easy to convert to CMake -- as a matter of fact, I wrote a perl script, which converts the Kaldi makefiles into CMakeLists.txt. Also, the OpenFST was very easy to convert to CMake.

Other options are Bazel (as Korbinian said), scons (uses python) or jam (which I know was/is used in boost) I have no experience with Bazel. I've found jam (or bjam?) quite annoying when trying to build boost. I didn't spend too much time on scons y.

On Mon, Mar 11, 2019 at 6:35 AM Korbinian [email protected] wrote:

I find cmake better than plain make, mostly for the support how to include dependencies. However, it's also a bit of a pain to wrap your head around it. I've heard good things about bazel, in particular regarding build servers and incremental build (both on server and client).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471487274, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX1PnEkwWfTxp5-Z_yI0TboPDCxvwks5vVjF6gaJpZM4bnduF .

jtrmal avatar Mar 11 '19 14:03 jtrmal

it seems MS has the intention to support CMake in the VS. At least I've heard something for VS2017, not sure if it happened or if they dropped it or what -- @kkm would probably know.

They do. Which does not mean you can open a Kaldi-sized project with it. Since CMake generates about 500 separate project files, one per library and one per exe, it just blows up and dies spectacularly. For smaller projects, like zlib, it worked okay... It's not even MS to blame this time, it's how CMake handles Visual Studio. One output, one separate project in a separate directory.

I do not care much about building the whole rig on Windows. I tried it once, and it was less that exciting in the end. I even ran an eg end to end (tedlium, IIRC), using Cygwin bash. I even added support for it to Kaldi code, so things like popen pipe stuff through bash.

Windows 10 has added two intereting features. One is real functioning symlinks (in fact, they were there in NTFS all the time, but with a twist: you needed to be an admin to create symlinks, and, while you can grant any account a separate privilege to create symlinks, this privilege is unconditionally suppressed for admin accounts, unless you elevate, i. e. "sudo". So, to translate to Linux lingo, you must either be either a non-sudoer or create symlinks under sudo. In W10, all you need is enable developer mode).

The second is native support for Linux user-mode binaries. This is an interesting way to go, but I am not sure if this will cut it for Kaldi. I'll try at some point. It's an interesting Linux. It actually runs in a kind of container, with a special init. Everything in else in the usermode is just a normal Linux, with ELF binaries running out of the box. There are a few distros available, including Ubuntu, which I use. I have no idea if this thing supports CUDA though, and do not really hold my breath for it.

kkm000 avatar Mar 11 '19 18:03 kkm000

I think cmake vs. configure-script plus make is the real choice here. Including improvements to the existing configure scripts.

My experiences with bazel have been terrible, mostly because it's extremely hard to build itself. cmake does seem to be quite popular, e.g. pytorch uses it and I think some of Kaldi's own dependencies require it. I don't have much experience with it though.

On Mon, Mar 11, 2019 at 2:07 PM kkm (aka Kirill Katsnelson) < [email protected]> wrote:

it seems MS has the intention to support CMake in the VS. At least I've heard something for VS2017, not sure if it happened or if they dropped it or what -- @kkm https://github.com/kkm would probably know.

They do. Which does not mean you can open a Kaldi-sized project with it. Since CMake generates about 500 separate project files, one per library and one per exe, it just blows up and dies spectacularly. For smaller projects, like zlib, it worked okay... It's not even MS to blame this time, it's how CMake handles Visual Studio. One output, one separate project in a separate directory.

I do not care much about building the whole rig on Windows. I tried it once, and it was less that exciting in the end. I even ran an eg end to end (tedlium, IIRC), using Cygwin bash. I even added support for it to Kaldi code, so things like popen pipe stuff through bash.

Windows 10 has added two intereting features. One is real functioning symlinks (in fact, they were there in NTFS all the time, but with a twist: you needed to be an admin to create symlinks, and, while you can grant any account a separate privilege to create symlinks, this privilege is unconditionally suppressed for admin accounts, unless you elevate, i. e. "sudo". So, to translate to Linux lingo, you must either be either a non-sudoer or create symlinks under sudo. In W10, all you need is enable developer mode).

The second is native support for Linux user-mode binaries. This is an interesting way to go, but I am not sure if this will cut it for Kaldi. I'll try at some point. It's an interesting Linux. It actually runs in a kind of container, with a special init. Everything in else in the usermode is just a normal Linux, with ELF binaries running out of the box. There are a few distros available, including Ubuntu, which I use. I have no idea if this thing supports CUDA though, and do not really hold my breath for it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471655863, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu_AjSGhQsqWihxqDo9ZSv_IfRpwJks5vVptUgaJpZM4bnduF .

danpovey avatar Mar 11 '19 18:03 danpovey

I have personally enjoyed using CMake quite a lot. It is a fairly involved build system to set up, but it is also easiest to extend in an already existing C++ project, imo.

It also has the best CUDA support. Instead of our configure script which goes through a bunch of weird places that all the BLAS variants can be installed, you can just do find(BLAS) (or something like that). And you no longer need make depend! Halelujah.

It even supports building pip wheels for python projects with C++ dependencies easily via scikit-build https://github.com/scikit-build/scikit-build. I've done it myself here. It requires very few lines of code. The setup.py https://github.com/galv/galvASR/blob/dev/setup.pyfile is small. And the CMakeLists.txt file basically just needs a single line to build a shared object for python to load: https://github.com/galv/galvASR/blob/dev/galvASR/tensorflow_ext/CMakeLists.txt#L15 https://github.com/galv/galvASR/blob/dev/galvASR/tensorflow_ext/CMakeLists.txt#L15

No idea about cmake's support for windows. Frankly, it has never been a big priority for me. I would say that is true for a lot of the ML/scientific computing community.

On Mon, Mar 11, 2019 at 11:37 AM Daniel Povey [email protected] wrote:

I think cmake vs. configure-script plus make is the real choice here. Including improvements to the existing configure scripts.

My experiences with bazel have been terrible, mostly because it's extremely hard to build itself. cmake does seem to be quite popular, e.g. pytorch uses it and I think some of Kaldi's own dependencies require it. I don't have much experience with it though.

On Mon, Mar 11, 2019 at 2:07 PM kkm (aka Kirill Katsnelson) < [email protected]> wrote:

it seems MS has the intention to support CMake in the VS. At least I've heard something for VS2017, not sure if it happened or if they dropped it or what -- @kkm https://github.com/kkm would probably know.

They do. Which does not mean you can open a Kaldi-sized project with it. Since CMake generates about 500 separate project files, one per library and one per exe, it just blows up and dies spectacularly. For smaller projects, like zlib, it worked okay... It's not even MS to blame this time, it's how CMake handles Visual Studio. One output, one separate project in a separate directory.

I do not care much about building the whole rig on Windows. I tried it once, and it was less that exciting in the end. I even ran an eg end to end (tedlium, IIRC), using Cygwin bash. I even added support for it to Kaldi code, so things like popen pipe stuff through bash.

Windows 10 has added two intereting features. One is real functioning symlinks (in fact, they were there in NTFS all the time, but with a twist: you needed to be an admin to create symlinks, and, while you can grant any account a separate privilege to create symlinks, this privilege is unconditionally suppressed for admin accounts, unless you elevate, i. e. "sudo". So, to translate to Linux lingo, you must either be either a non-sudoer or create symlinks under sudo. In W10, all you need is enable developer mode).

The second is native support for Linux user-mode binaries. This is an interesting way to go, but I am not sure if this will cut it for Kaldi. I'll try at some point. It's an interesting Linux. It actually runs in a kind of container, with a special init. Everything in else in the usermode is just a normal Linux, with ELF binaries running out of the box. There are a few distros available, including Ubuntu, which I use. I have no idea if this thing supports CUDA though, and do not really hold my breath for it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471655863, or mute the thread < https://github.com/notifications/unsubscribe-auth/ADJVu_AjSGhQsqWihxqDo9ZSv_IfRpwJks5vVptUgaJpZM4bnduF

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471667364, or mute the thread https://github.com/notifications/unsubscribe-auth/AEi_UJTXXzvOjGKKCPx3s_Oi2JT3WAbiks5vVqJNgaJpZM4bnduF .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

galv avatar Mar 11 '19 23:03 galv

I would also like to mention that I personally like how cmakemakes installing dependencies, really, really easy.

For example, even though wav2letter and flashlight have a nightmarish number of dependencies, it is actually fairly easy to install them via cmake because almost all of the dependencies are cmake projects. I made an example project doing that here: https://github.com/galv/wav2letter-sample-project/blob/master/CMakeLists.txt I probably should have showed that to the FAIR people at some point, since their installation instructions are painful and use too much Docker, but oh well.

On Mon, Mar 11, 2019 at 4:06 PM Daniel Galvez [email protected] wrote:

I have personally enjoyed using CMake quite a lot. It is a fairly involved build system to set up, but it is also easiest to extend in an already existing C++ project, imo.

It also has the best CUDA support. Instead of our configure script which goes through a bunch of weird places that all the BLAS variants can be installed, you can just do find(BLAS) (or something like that). And you no longer need make depend! Halelujah.

It even supports building pip wheels for python projects with C++ dependencies easily via scikit-build https://github.com/scikit-build/scikit-build. I've done it myself here. It requires very few lines of code. The setup.py https://github.com/galv/galvASR/blob/dev/setup.pyfile is small. And the CMakeLists.txt file basically just needs a single line to build a shared object for python to load: https://github.com/galv/galvASR/blob/dev/galvASR/tensorflow_ext/CMakeLists.txt#L15 https://github.com/galv/galvASR/blob/dev/galvASR/tensorflow_ext/CMakeLists.txt#L15

No idea about cmake's support for windows. Frankly, it has never been a big priority for me. I would say that is true for a lot of the ML/scientific computing community.

On Mon, Mar 11, 2019 at 11:37 AM Daniel Povey [email protected] wrote:

I think cmake vs. configure-script plus make is the real choice here. Including improvements to the existing configure scripts.

My experiences with bazel have been terrible, mostly because it's extremely hard to build itself. cmake does seem to be quite popular, e.g. pytorch uses it and I think some of Kaldi's own dependencies require it. I don't have much experience with it though.

On Mon, Mar 11, 2019 at 2:07 PM kkm (aka Kirill Katsnelson) < [email protected]> wrote:

it seems MS has the intention to support CMake in the VS. At least I've heard something for VS2017, not sure if it happened or if they dropped it or what -- @kkm https://github.com/kkm would probably know.

They do. Which does not mean you can open a Kaldi-sized project with it. Since CMake generates about 500 separate project files, one per library and one per exe, it just blows up and dies spectacularly. For smaller projects, like zlib, it worked okay... It's not even MS to blame this time, it's how CMake handles Visual Studio. One output, one separate project in a separate directory.

I do not care much about building the whole rig on Windows. I tried it once, and it was less that exciting in the end. I even ran an eg end to end (tedlium, IIRC), using Cygwin bash. I even added support for it to Kaldi code, so things like popen pipe stuff through bash.

Windows 10 has added two intereting features. One is real functioning symlinks (in fact, they were there in NTFS all the time, but with a twist: you needed to be an admin to create symlinks, and, while you can grant any account a separate privilege to create symlinks, this privilege is unconditionally suppressed for admin accounts, unless you elevate, i. e. "sudo". So, to translate to Linux lingo, you must either be either a non-sudoer or create symlinks under sudo. In W10, all you need is enable developer mode).

The second is native support for Linux user-mode binaries. This is an interesting way to go, but I am not sure if this will cut it for Kaldi. I'll try at some point. It's an interesting Linux. It actually runs in a kind of container, with a special init. Everything in else in the usermode is just a normal Linux, with ELF binaries running out of the box. There are a few distros available, including Ubuntu, which I use. I have no idea if this thing supports CUDA though, and do not really hold my breath for it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471655863 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ADJVu_AjSGhQsqWihxqDo9ZSv_IfRpwJks5vVptUgaJpZM4bnduF

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471667364, or mute the thread https://github.com/notifications/unsubscribe-auth/AEi_UJTXXzvOjGKKCPx3s_Oi2JT3WAbiks5vVqJNgaJpZM4bnduF .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

galv avatar Mar 11 '19 23:03 galv

The windows cmake might be weak on how it does what it does, but if the input to me is the same, I don't see why I care what cmake does behind the scenes. The current process within kaldi has to be maintained and with cmake, this goes down considerably.

Part of make is pretty simple and that part is very good, but right now the configure script has to be maintained and it seems to me that cmake is better than this scenario. As far as what google puts out, they tend to release a tool and never update it for a decade or more. Mostly unsupported software. Also CMake has many Find..... scripts that already exist.

I'd not over-engineer the build system. And we use cmake here, so I've used it a bunch. There are absolutely things I like and things I don't. I'd take a simple makefile any day, but you don't start with that

btiplitz avatar Mar 12 '19 00:03 btiplitz

pybind11 has good integration with cmake too, if you are using cmake it makes it quite easy to generate the python package or whetever it is, I think.

On Mon, Mar 11, 2019 at 8:39 PM Brett Tiplitz [email protected] wrote:

The windows cmake might be weak on how it does what it does, but if the input to me is the same, I don't see why I care what cmake does behind the scenes. The current process within kaldi has to be maintained and with cmake, this goes down considerably.

Part of make is pretty simple and that part is very good, but right now the configure script has to be maintained and it seems to me that cmake is better than this scenario. As far as what google puts out, they tend to release a tool and never update it for a decade or more. Mostly unsupported software. Also CMake has many Find..... scripts that already exist.

I'd not over-engineer the build system. And we use cmake here, so I've used it a bunch. There are absolutely things I like and things I don't. I'd take a simple makefile any day, but you don't start with that

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471798100, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu8ncQTfMAmlRFm9qkJi3qyaXwZThks5vVvdPgaJpZM4bnduF .

danpovey avatar Mar 12 '19 03:03 danpovey

My experience with CMake has also been positive so far, even though I've only used it for relatively small projects. It also has good support for cross-compilation via the so called toolchain files. IIRC Google's Android build system(based on Gradle) is using CMake behind the scenes to build the JNI code.

TLDR; +1 for CMake.

vdp avatar Mar 12 '19 06:03 vdp

Looks like there is pretty much a consensus on CMake. : )

Speaking of Windows, if you're up to running a full training pipeline, CMake is going to be the least of your problems anyway, so I am not worrying much about its poor Windows support.

image (image source)

kkm000 avatar Mar 12 '19 08:03 kkm000

My experiences with cmake have been generally positive too. Recently I had some trouble configuring a project to make shared libraries (instead of static) but that's probably because I have never read any cmake manuals/tutorials. So I think cmake might be good.

However, the current build system is already very good in the sense that it works on most machines without any issues. So unless we plan to add a lot of more dependencies to Kaldi, I'm not sure what would be the benefit of switching to cmake.

hhadian avatar Mar 12 '19 13:03 hhadian

We can assess whether cmake is actually better once someone actually comes up with a proposed build system based on cmake. I am hoping someone will volunteer for that. One advantage is that if we do python wrapping with pybind, cmake makes that super easy I believe. And it should be less effort to maintain when things like new Debian or Red Hat versions come out, as we are piggybacking off cmake's work.

On Tue, Mar 12, 2019 at 9:06 AM Hossein Hadian [email protected] wrote:

My experiences with cmake have been generally positive too. Recently I had some trouble configuring a project to make shared libraries (instead of static) but that's probably because I have never read any cmake manuals/tutorials. So I think cmake might be good.

However, the current build system is already very good in the sense that it works on most machines without any issues. So unless we plan to add a lot of more dependencies to Kaldi, I'm not sure what would be the benefit of switching to cmake.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471992434, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu2wsEc6fizPH6elL-SXRLnIqsZ8Dks5vV6ZzgaJpZM4bnduF .

danpovey avatar Mar 12 '19 17:03 danpovey

@danpovey I will volunteer for it. I have the experience it takes. I'm not entirely sure about all of the work involved, but I hope to essentially eradicate the tools directory, replacing it with a single .cmake file which downloads and builds the dependencies for us, as well as converting the src/ directory to use CMakeLists.txt.

In case I get unresponsive, feel free to ping me. If anyone else would like to do this as well, you can also ping me.

I am working on top of your kaldi10 branch, Dan.

On Tue, Mar 12, 2019 at 10:11 AM Daniel Povey [email protected] wrote:

We can assess whether cmake is actually better once someone actually comes up with a proposed build system based on cmake. I am hoping someone will volunteer for that. One advantage is that if we do python wrapping with pybind, cmake makes that super easy I believe. And it should be less effort to maintain when things like new Debian or Red Hat versions come out, as we are piggybacking off cmake's work.

On Tue, Mar 12, 2019 at 9:06 AM Hossein Hadian [email protected] wrote:

My experiences with cmake have been generally positive too. Recently I had some trouble configuring a project to make shared libraries (instead of static) but that's probably because I have never read any cmake manuals/tutorials. So I think cmake might be good.

However, the current build system is already very good in the sense that it works on most machines without any issues. So unless we plan to add a lot of more dependencies to Kaldi, I'm not sure what would be the benefit of switching to cmake.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471992434, or mute the thread < https://github.com/notifications/unsubscribe-auth/ADJVu2wsEc6fizPH6elL-SXRLnIqsZ8Dks5vV6ZzgaJpZM4bnduF

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-472093991, or mute the thread https://github.com/notifications/unsubscribe-auth/AEi_UBw-iZxvdrbqr1mw59PtmCmncXcrks5vV9-jgaJpZM4bnduF .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

galv avatar Mar 12 '19 17:03 galv

Great. Regarding the 'tools' directory, bear in mind that it contains two types of things: (1) libraries and headers that are required to build Kaldi (2) tools, some mandatory (OpenFst), some optional, that are required to run some Kaldi recipes, and ways to install them. For (1), it may make sense to get cmake more involved; but for (2) we still need the tools/ directory.

Also there may be reasons to avoid the "system" versions of, say, OpenFst; for example, it might not be built with the flags that we need it to be built with, or something like that. I guess what I'm saying is: don't be too aggressive, and remember that the convenience of users is paramount; if there is a choice between ease of installation (or being more robust), vs. doing things the "cmake" way, I don't want to do things the "cmake" way.

The check_dependencies.sh script may still be quite helpful because it is quite explicit about how to fix problems. My experience with cmake errors is that while, to me, it tends to be pretty obvious how to address them, they are definitely not obvious to all of the kinds of people who ask questions on the Kaldi lists.

On Tue, Mar 12, 2019 at 1:43 PM Daniel Galvez [email protected] wrote:

@danpovey I will volunteer for it. I have the experience it takes. I'm not entirely sure about all of the work involved, but I hope to essentially eradicate the tools directory, replacing it with a single .cmake file which downloads and builds the dependencies for us, as well as converting the src/ directory to use CMakeLists.txt.

In case I get unresponsive, feel free to ping me. If anyone else would like to do this as well, you can also ping me.

I am working on top of your kaldi10 branch, Dan.

On Tue, Mar 12, 2019 at 10:11 AM Daniel Povey [email protected] wrote:

We can assess whether cmake is actually better once someone actually comes up with a proposed build system based on cmake. I am hoping someone will volunteer for that. One advantage is that if we do python wrapping with pybind, cmake makes that super easy I believe. And it should be less effort to maintain when things like new Debian or Red Hat versions come out, as we are piggybacking off cmake's work.

On Tue, Mar 12, 2019 at 9:06 AM Hossein Hadian <[email protected]

wrote:

My experiences with cmake have been generally positive too. Recently I had some trouble configuring a project to make shared libraries (instead of static) but that's probably because I have never read any cmake manuals/tutorials. So I think cmake might be good.

However, the current build system is already very good in the sense that it works on most machines without any issues. So unless we plan to add a lot of more dependencies to Kaldi, I'm not sure what would be the benefit of switching to cmake.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-471992434 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ADJVu2wsEc6fizPH6elL-SXRLnIqsZ8Dks5vV6ZzgaJpZM4bnduF

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-472093991, or mute the thread < https://github.com/notifications/unsubscribe-auth/AEi_UBw-iZxvdrbqr1mw59PtmCmncXcrks5vV9-jgaJpZM4bnduF

.

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-472107805, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu9eDw0-ZYzsOMdasar5YTbybPlVAks5vV-ddgaJpZM4bnduF .

danpovey avatar Mar 12 '19 17:03 danpovey

I see. What I like about CMake is that you can add external projects, even those that don't use cmake, like openfst. You can dig around in this file to see what I mean: https://github.com/galv/galvASR/blob/7d5d7826805cbbd0b40954f7eec262f0a7e35f01/galvASR/cmake/external.cmake#L49 It can even download archives and unzip them for installation as well, like we do manually.

So you could imagine removing the tools/Makefile target and replacing it with a new cmake target which encodes the same thing. I'll see if it is reasonably clean to do or not.

(My experience mucking with these details and my unwavering commitment to C++ in spite of it is why I am volunteering for this. I'm not sure how many people have the patience for all the details of C++ build systems!)

Using C++ system libraries rather than project-local libraries which you build yourself is just an exercise in futility nowadays unless you're at a big company which can afford to custom tailor its build, especially after libstdc++ broke the std::string and std::list ABI to be C++11 compatible.

galv avatar Mar 12 '19 17:03 galv

@galv I should be able to help you test some of the variants including windows.

btiplitz avatar Mar 14 '19 00:03 btiplitz

IMO CMake is the best of the current c++ build systems. It is widely supported, almost all developers know how to use it and it has good support for third party libraries and also a number of ways of finding third party libraries.

If you do go down the CMake route it might also be worth looking at conan package manager as an option for the configuration of the third party libraries / tools.

ttroy50 avatar Mar 18 '19 19:03 ttroy50

+1 for CMake

It's pain in the ass when you want to integrate a makefile project with a CMake one.

With cmake, you can use vcpkg to handle your dependencies on windows.

cloudhan avatar Mar 25 '19 03:03 cloudhan

@ttroy50, @cloudhan

Guys, I hate to be a party-crasher, but do not get too gung-ho about package management of CMake, or conan or vcpkg or any package manager whatsoever. Keep in mind that the obligatory Kaldi dependencies are CUDA, MKL and OpenFST, which aren't there. Minor dependencies, maybe.

kkm000 avatar Mar 25 '19 19:03 kkm000

@kkm000 Good point. I'm not sure about CUDA but it should be possible to package OpenFST.

And from a technical point of view I think MLK would be possible too but it might not be allowed because of licensing.

ttroy50 avatar Mar 26 '19 11:03 ttroy50

I don't think packaging all the dependencies is a good idea. It might be necessary to run some pre-script that downloads the required packages like fst. I am pretty sure this could be done within CMAKE with an execute_process command either calling a script or putting the actual get commands in. And putting the MKL into the pull would be kind of crazy. I've got multiple copies of kaldi and this would balloon my storage requirements to a very crazy amount of storage. The MKL is expected to be installed in specific locations and within a source tree is not a reasonable location for so many reasons. CUDA is installed with an RPM or other package management system. There is no reason for this to change

btiplitz avatar Mar 26 '19 12:03 btiplitz

If you intend to implement a CMake build system for Kaldi, I thoroughly recommend taking 2 or 3 days to read Professional CMake: A Practical Guide by Craig Scott. It's the best resource I've found for learning what CMake is capable of, and what you should and should not do.

For downloading dependencies, CMake offers several options:

  1. FetchContent downloads a dependency at config time. Works best if the dependency is source code that can be built by CMake. Also works if the dependency is a pre-compiled binary.
  2. ExternalProject_Add for downloading a dependency at build time. Works well for all types of dependencies but often implies that you use a "Superbuild" CMake structure. Allows you to define your own Config, Build, Install, and other custom steps if dependency doesn't use CMake. When I say "all types" I mean:
    • Source that builds using CMake
    • Source that builds using some other build system
    • Pre-compiled binaries
    • If a dependency is installed using a package manager then it may require administrator privilege and should be installed manually before initiating the CMake build. Or perhaps you could use (4) or (5) below to invoke the package manager at config time.
  3. You can use add_custom_process to run a script at build time. This custom process can then be wrapped in a CMake target using add_custom_target so other targets can depend on the outputs of the script.
  4. You can use include(script.cmake) to run a CMake script at config time, using the same variable scope as the rest of your project.
  5. Finally, as @btiplitz mentioned, you can use execute_process to run a script at config time.

I try to avoid native scripts like bash and bat because they break cross-platform compatibility and create duplicate work. Using Python or Perl scripts is okay if you're okay with adding them as a dependency. I prefer to write cross-platform CMake scripts and invoke them using the ${CMAKE_COMMAND} -P script.cmake calling syntax.

For creating CMake targets once your dependencies are downloaded, CMake offers several options:

  1. Ideally, your dependency provides a config module DependencyConfig.cmake that defines CMake targets for all of its build outputs along with the transitive dependencies between them. It's possible for them to provide a CMake config module even if they do not build using CMake.
  2. If your dependency does not provide a config module, then you can write your own CMake find module FindDependency.cmake that attempts to find the necessary build outputs, wrap them in CMake targets, and define the transitive dependencies between them. Find modules can work with dependencies that are installed via package manager as well.

Once targets are defined, linking against a dependency is as simple as adding it to your target_link_libraries() list.

EDIT: spelling

MantisClone avatar Mar 26 '19 14:03 MantisClone

I am still torn on whether to go the CMake route. I guess I feel that just refactoring the 'configure' script (e.g. having it call other scripts that are easier to read) might also be a viable option, as it would be so much more self-explanatory than a CMake setup, and easier to modify. I mean, CMake is just so much framework.

On Tue, Mar 26, 2019 at 10:38 AM David Hunt-Mateo [email protected] wrote:

If you intend to implement a CMake build system for Kaldi, I thoroughly recommend taking 2 or 3 days to read Professional CMake: A Practical Guide https://crascit.com/professional-cmake/ by Craig Scott. It's the best resource I've found for learning what CMake is capable of, and what you should and should not do.

For downloading dependencies, CMake offers several options:

  1. FetchContent downloads a dependency at config time. Works best if the dependency is source code that can be built by CMake. Also works if the dependency is a pre-compiled binary.
  2. ExternalProject_Add for downloading a dependency at build time. Works well for all types of dependencies but often implies that you use a "Superbuild" CMake structure. Allows you to define your own Config, Build, Install, and other custom steps if dependency doesn't use CMake. When I say "all types" I mean:
    • Source that builds using CMake
    • Source that builds using some other build system
    • Pre-compiled binaries
    • If a dependency is installed using a package manager then it may require administrator privilege and should be installed manually before initiating the CMake build. Or perhaps you could use (4) or (5) below to invoke the package manager at config time.
  3. You can use add_custom_process to run a script at build time. This custom process can then be wrapped in a CMake target using add_custom_target so other targets can depend on the outputs of the script.
  4. You can use include(script.cmake) to run a CMake script at config time, using the same variable scope as the rest of your project.
  5. Finally as @btiplitz https://github.com/btiplitz mentioned, you can use execute_process to run a script at config time.

I try to avoid native scripts like bash and bat because they break cross-platform compatibility and create dupliate work. Using Python or Perl scripts is okay if you're okay with adding them as a dependency. I prefer to write cross-platform CMake scripts and invoke them using the ${CMAKE_COMMAND} -P script.cmake calling syntax.

For creating CMake targets once your dependencies are downloaded, CMake offers several options:

  1. Ideally, your dependency provides a config module DependencyConfig.cmake that defines CMake targets for all of its build outputs along with the transitive dependencies between them. It's possible for them to provide a CMake config module even if they do not build using CMake.
  2. If your dependency does not provide a config module, then you can write your own CMake find module FindDependency.cmake that attempts to find the necessary build outputs, wrap them in CMake targets, and define the transitive dependencies between them. Find modules can work with dependencies that are installed via package manager as well.

Once targets are defined, linking against a dependency is as simple as adding it to your target_link_libraries() list.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-476676196, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuyy8JPY8F3B8QGFbKofSv2x2TKc5ks5vajDvgaJpZM4bnduF .

danpovey avatar Mar 26 '19 16:03 danpovey

@DMats Hi, I haven't met you before, but thank you for recommending that book! I wasn't aware that an authoritative book had actually been written on cmake. I had picked up most of my knowledge from presentations.

I was personally planning to required people to depend on typing make in the tools/ directory for now. The reason why is that if you use ExternalProject_Add, and ever do a make clean in your build directory, you will have to rebuild your dependencies from scratch. And rebuilding openfst is slow.

Regarding packing openfst and then doing Find(OpenFST), no, I don't think that's a good idea. The ABI compatibility is too complicated. We can build it already ourselves, and create an "imported target" in cmake which provides its include directories and libraries for us. I did this in a local change on my PR #3100, but I need to rebase that on top of the latest kaldi10 and push and I need to run.

galv avatar Mar 26 '19 16:03 galv

I've been writing up a list of nice things that cmake will give us.

They are:

  • Easy ARM Cross-compilation and simulation (You can run the tests in an ARM simulator via a switch to cmake's ctest tool). I personally think this one is very cool.
  • It is easy to create a python package with native dependencies via cmake, via scikit-build (I've used it before, and it is professional software, not some hack). I would say this is a very big deal.
  • Export Kaldi as a dependency to other cmake projects, since everyone else is using cmake anyway :P We can tighten up dependencies via INTERFACE and non-INTERFACE variables (the former come from PUBLIC dependencies, the latter from PRIVATE) in cmake. For example, in my current cmake-based build PR, I made sure that our build system won't have to expose the macros HAVE_CBLAS,HAVE_OPENBLAS, HAVE_MKL, to any downstream dependers upon the matrix library.
  • Easy to do tests with valgrind and cuda-memcheck via adding command line flags when running our tests. No more writing custom rules.
  • Possibly faster builds with ninja. CMake can generate ninja build rules automatically for us, just as easily as Makefile rules. We would have to verify that, though.

galv avatar Mar 26 '19 16:03 galv

@galv Pleasure to make your acquaintance.

Respectfully, I wasn't trying to suggest that using pre-built OpenFST binaries is the best idea in Kaldi's case.

My FindKaldi.cmake find module finds OpenFST from the tools/ directory after Kaldi's build system has built it. It then exposes OpenFST as an IMPORTED CMake target Kaldi::Kaldi_OpenFst and as a set of variables Kaldi_OpenFst_INCLUDE_DIRS, Kaldi_OpenFst_LIBRARIES. Prefer to use the target.

Obviously, this find module is incomplete as it completely neglects the other build products OpenFST produces. But they could be found in effectively the same way.

#=================================
# OpenFst
#=================================

find_path(Kaldi_Tools_DIR
    NAMES extras/check_dependencies.sh
    HINTS ${Kaldi_ROOT_DIR}/tools
)

find_path(Kaldi_OpenFst_INCLUDE_DIR
    NAMES fst/fstlib.h
    HINTS ${Kaldi_Tools_DIR}/openfst/include
)

find_library(Kaldi_OpenFst_LIBRARY
    NAMES fst
    HINTS ${Kaldi_Tools_DIR}/openfst/lib
)

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(Kaldi_OpenFst
    REQUIRED_VARS Kaldi_OpenFst_INCLUDE_DIR Kaldi_OpenFst_LIBRARY
    VERSION_VAR Kaldi_OpenFst_VERSION
)
mark_as_advanced(Kaldi_OpenFst_INCLUDE_DIR Kaldi_OpenFst_LIBRARY Kaldi_OpenFst_VERSION)

if(Kaldi_OpenFst_FOUND)
    set(Kaldi_OpenFst_INCLUDE_DIRS ${Kaldi_OpenFst_INCLUDE_DIR})
    set(Kaldi_OpenFst_LIBRARIES ${Kaldi_OpenFst_LIBRARY})
endif()

if(Kaldi_OpenFst_FOUND AND NOT TARGET Kaldi::Kaldi_OpenFst)
    add_library(Kaldi::Kaldi_OpenFst UNKNOWN IMPORTED)
    set_target_properties(Kaldi::Kaldi_OpenFst PROPERTIES
        IMPORTED_LINK_INTERFACE_LANGUAGES CXX
        IMPORTED_LOCATION ${Kaldi_OpenFst_LIBRARIES}
        INTERFACE_INCLUDE_DIRECTORIES ${Kaldi_OpenFst_INCLUDE_DIRS}
    )
    target_link_libraries(Kaldi::Kaldi_OpenFst
        INTERFACE
            ${CMAKE_DL_LIBS}
    )
endif()

EDIT: Clarify that resulting OpenFST target is imported.

MantisClone avatar Mar 26 '19 16:03 MantisClone

@danpovey I'd agree that if it takes 2-3 days to understand cmake, then that seems way too complicated, but if 1 person does the transition to cmake, then the 2-3 days does not apply to every person using kaldi as only the person doing the whole conversion needs to understand it. Once converted, once it's working, it would be rare that you would need to understand everything. google search is pretty effective on doing a single task and more effective than scanning any book.

btiplitz avatar Mar 26 '19 17:03 btiplitz

I think it looks scary because you guys started to overengineer things without thinking about the way all in Kaldi is generally designed -- not too many bells and whistles and many-layer-frameworks with many different command line switches because you can always change it in the source codes. y.

On Tue, Mar 26, 2019 at 1:41 PM Brett Tiplitz [email protected] wrote:

@danpovey https://github.com/danpovey I'd agree that if it takes 2-3 days to understand cmake, then that seems way too complicated, but if 1 person does the transition to cmake, then the 2-3 days does not apply to every person using kaldi as only the person doing the whole conversion needs to understand it. Once converted, once it's working, it would be rare that you would need to understand everything. google search is pretty effective on doing a single task and more effective than scanning any book.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3086#issuecomment-476766660, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX_D4lOqt878WWH-02Pyx4k8kAIFTks5valu_gaJpZM4bnduF .

jtrmal avatar Mar 26 '19 17:03 jtrmal