rust-bindgen icon indicating copy to clipboard operation
rust-bindgen copied to clipboard

Rewrite Clang interface to use C++ libtooling API instead of libclang

Open rinon opened this issue 5 years ago • 20 comments
trafficstars

The libclang API doesn't expose enough information about the AST for what bindgen really needs, which has spawned numerous workarounds and bugs in bindgen. This PR replaces the generic libclang interface with a customized C wrapper around the Clang C++ API, which can simply expose everything Clang knows about the AST. This should allow us to fix some long outstanding issues (e.g., #380, #778) as we can directly query for the correct struct layouts and C++ calling conventions.

This PR only replaces the libclang interface with new, functionally equivalent bindings that should function identically to the libclang-9 API. It does not fix any bugs or change functionality, and should result in no functional changes modulo differences in behavior between libclang-9 and previous versions.

Using the clang libtooling libraries rather than libclang means that we now link against the individual clang libraries for each module (e.g. libclangAST, libclangBasic, etc.). In most LLVM builds, these libraries are built and distributed as static libraries, but we support both static and dynamic linking, in case they are available as shared libraries. I haven't replicated the clang-sys crate's option for dynamic loading of libclang.so at run time, so we have to link against a particular set of clang libraries at build time. Since most LLVM distributions ship these libraries as static libraries, in most cases we couldn't defer library linking to run time, even if we implemented support for it.

This resolves #297.

rinon avatar Mar 03 '20 22:03 rinon

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

highfive avatar Mar 03 '20 22:03 highfive

Happy to squash if that's preferred, wasn't sure what you'd prefer.

rinon avatar Mar 03 '20 22:03 rinon

Important note: this includes code borrowed and adapted from the LLVM project in src/clang/libclang_compat.cpp. I was careful to mark these as such, but I wanted to make sure this was going to be alright licensing-wise. The LLVM code is licensed under Apache2 with LLVM exceptions.

rinon avatar Mar 03 '20 22:03 rinon

The LLVM 3.8 tests are expected to fail. This is because the llvm.org distribution of LLVM 3.8 was linked against an older libstdc++ than Ubuntu 16.04 now has, so we can't link C++ code against it. We will either need to build LLVM 3.8 from source for testing or find another package that is actually correct.

Unfortunately the Ubuntu distro package of libclang 3.8 is also unusable for this purpose because it ships broken CMake files (see https://bugs.llvm.org/show_bug.cgi?id=23352, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=819072, https://bugs.launchpad.net/ubuntu/+source/llvm-toolchain-3.8/+bug/1387011). On Ubuntu 16.04 we should be able to link against any system libclang-*-dev package newer than 3.8, but this doesn't help for testing 3.8.

rinon avatar Mar 03 '20 23:03 rinon

r? @emilio @fitzgen

rinon avatar Mar 03 '20 23:03 rinon

Thanks for looking into this. My bad on make_shared(), I didn't realize it was C++20 only. I've only tested with somewhat recent compilers so far, and hadn't gotten to anything older. On that note, this is definitely going to need much more testing before even thinking about deployment, which I'm planning to do myself if the code and structure seems at least moderately acceptable. Happy to test on older distros, windows, mac, etc.

I'm surprised that the system clang package didn't work on Fedora. Looks like you need clang-libs in addition to clang-devel. I can test that out later today and make sure there's not a build system issue there.

rinon avatar Mar 04 '20 21:03 rinon

Correction, the C++20 version is only for shared_ptr<U[N]>. The version I was using is C++11, which we definitely require. What Fedora version and compiler is this?

rinon avatar Mar 04 '20 22:03 rinon

Just another testing data point: I was able to build against the system clang libs on a clean Fedora 30 container with: dnf install cmake clang-devel llvm-devel clang-libs. One test "fails" (needs updated expectation) for the system clang 8 libraries because we expect the clang-5 output for clang-8, but clang-8 actually behaves like clang-9 for header_objc_template_h.

rinon avatar Mar 04 '20 22:03 rinon

I'm on Fedora 32. I have clang 10 and gcc 10.0.1.

To be clear make_shared<> compiles just fine, but I somehow did hit a weird linker error...

I do have llvm-libs and clang-libs installed (10.0.0 both, same for -devel and the llvm/clang packages), but llvm-config --libdir points to /usr/lib64 which contains libclang but not the libclangSema.* etc libs.

emilio avatar Mar 04 '20 23:03 emilio

Alright, thanks, I'll test on Fedora 32 and see what's going on with the latest libstdc++.

I do have llvm-libs and clang-libs installed (10.0.0 both, same for -devel and the llvm/clang packages), but llvm-config --libdir points to /usr/lib64 which contains libclang but not the libclangSema.* etc libs.

Ahhh, gotcha. I just saw https://fedoraproject.org/wiki/Changes/Stop-Shipping-Individual-Component-Libraries-In-clang-lib-Package that describes this change. I'll adjust the build system to look for libclang-cpp.so when possible. That will also simplify builds on other distros that ship a C++ shared library for LLVM 9+.

rinon avatar Mar 05 '20 01:03 rinon

I manually replaced the individual libclang* tooling libraries with libclang-cpp in the build script which built and tested just fine with the system clang libraries on Fedora 32. I'll figure out a clean way to determine if we should link against individual clang tooling libs or the new amalgamated libclang-cpp library.

I wasn't able to reproduce the weirdness of make_shared. Could your mozbuild LLVM build have been linking against a different C++ standard library, by any chance? That might do it?

I also investigated the situation on windows. Unfortunately the llvm.org binary distributions don't ship C++ headers or the c++ libraries at all. The only decent options I've come up with so far are recommending that users build llvm from source, or shipping a statically linked binary version of bindgen on windows. Of course, a statically linked binary tool isn't as useful as the crate library, but it would be a lot easier for users to pick up and use.

Just brainstorming, I wonder if it would be possible to maintain both a libclang and a clang tooling frontend interface, and just get less information if only libclang was available. That sounds like a maintenance nightmare to me, personally.

The situation on OS X is basically as easy as linux. I'm not sure if the OS ships the C++ libs but I believe both the homebrew and macports packages do, and the llvm.org package probably does as well.

rinon avatar Mar 05 '20 21:03 rinon

Ahhh, gotcha. I just saw https://fedoraproject.org/wiki/Changes/Stop-Shipping-Individual-Component-Libraries-In-clang-lib-Package that describes this change. I'll adjust the build system to look for libclang-cpp.so when possible. That will also simplify builds on other distros that ship a C++ shared library for LLVM 9+.

Ah, nice find.

I wasn't able to reproduce the weirdness of make_shared. Could your mozbuild LLVM build have been linking against a different C++ standard library, by any chance? That might do it?

I don't think so... Maye sccache / ccache are involved? But anyhow yeah, I'm equally freaked out by that error. This error is very easy to workaround though.

I also investigated the situation on windows. Unfortunately the llvm.org binary distributions don't ship C++ headers or the c++ libraries at all. The only decent options I've come up with so far are recommending that users build llvm from source, or shipping a statically linked binary version of bindgen on windows. Of course, a statically linked binary tool isn't as useful as the crate library, but it would be a lot easier for users to pick up and use.

Ugh, that's really unfortunate... It works for the CLI tool, I guess, but not for all the crates that run bindgen at build-time. I wonder if rustc could ship these libraries somehow...

Just brainstorming, I wonder if it would be possible to maintain both a libclang and a clang tooling frontend interface, and just get less information if only libclang was available. That sounds like a maintenance nightmare to me, personally.

That maybe isn't that crazy as it sounds, if done right... We'd basically replace libclang.so with our own, and add new bindgen APIs... Then we can just use the existing code with all the is_loaded() shenanigans, potentially. So it wouldn't really look that much different than a native "improved" libclang. It really depends on which kinds of changes we want to do, if we want to add a bunch of extra stuff it might become a pain... But maybe that's a good path forward?

The situation on OS X is basically as easy as linux. I'm not sure if the OS ships the C++ libs but I believe both the homebrew and macports packages do, and the llvm.org package probably does as well.

Alright, that's great to know, thank you.

emilio avatar Mar 09 '20 17:03 emilio

I've been working on and discussing distribution of Windows C++ LLVM+Clang libraries on the LLVM mailing list. Haven't gotten many responses there yet, but I'm continuing to try. It's still odd to me that they ship the static C++ libraries for *NIX but not for Windows.

In the mean time, I've thought of a few other options. One option, as you suggested might be for rustc to distribute static versions of the clang and LLVM libs. I think this would substantially increase the size of the compiler packages, so I'm not sure if that would be viable. Another option is to host a Windows build of LLVM+clang suitable for linking C++ projects. This could be a fallback option in case the LLVM release maintainers are opposed to adding the libs to the official packages.

Just brainstorming, I wonder if it would be possible to maintain both a libclang and a clang tooling frontend interface, and just get less information if only libclang was available. That sounds like a maintenance nightmare to me, personally.

That maybe isn't that crazy as it sounds, if done right... We'd basically replace libclang.so with our own, and add new bindgen APIs... Then we can just use the existing code with all the is_loaded() shenanigans, potentially. So it wouldn't really look that much different than a native "improved" libclang. It really depends on which kinds of changes we want to do, if we want to add a bunch of extra stuff it might become a pain... But maybe that's a good path forward?

The more I think about this option the less it appeals. I'm just afraid of the maintenance burden of bugfixes for both APIs and a disjoint set of features and issues depending on how you link against clang. I'm happy to take on some of the maintenance of this new C++ API and associated features, but supporting both might be too much of a time sink. That said, I'm very interested in improving bindgen support for C++, so if all else fails this is still something I'm willing to try.

Just wanted to keep the PR updated, things are still moving on my side.

rinon avatar Mar 26 '20 21:03 rinon

:umbrella: The latest upstream changes (presumably d5cdad2b55300b2dea111a37fc21e589ff9d5c67) made this pull request unmergeable. Please resolve the merge conflicts.

bors-servo avatar May 03 '20 14:05 bors-servo

Is there anything people could do to get this merged?

DemiMarie avatar Jun 11 '22 06:06 DemiMarie

Is there anything people could do to get this merged?

Fixing the existing merge conflicts (thus allowing the test suite to run again) would be a nice start.

kulp avatar Jun 11 '22 12:06 kulp

Hello everyone who has been involved with this PR thus far - we (@pvdrz and I) are beginning to work on some tasks in the project as a part of our contract. This PR is one of the first things we are asked to look at. We want to do a simple check if anyone is still working on this PR that we are unaware of. We do not want to step on anyone's progress and work. Please let us know if:

  • we can take over this PR
  • there is some pending working still yet to be pushed by anyone here
  • any details anyone would like to add

Thank you!

amanjeev avatar Aug 09 '22 15:08 amanjeev

Hey @amanjeev, good to hear there's interest in reviving this. I basically dropped this because I couldn't find a good answer for how to handle windows. There seemed to be a fairly strong argument that we should be able to use the official LLVM windows builds, but those don't ship with a C++ libtooling DLL afaik. The fundamental reason for that is that windows has a hard cap on the number of exported symbols in a DLL that libtooling exceeds. So, where I left this was that we need to go and cut the number of symbols exported there down to fit in a DLL and then possibly maintain both a libclang and a libtooling bindgen interface for awhile until libtooling is widely available.

I'm happy to help out here as well, but I didn't see a good path forward. You're welcome to push on this, maybe you can get further than I have. Keep me posted on potential plans and I'll see what I can do to help.

rinon avatar Aug 09 '22 16:08 rinon

We want to do a simple check if anyone is still working on this PR that we are unaware of.

@amanjeev thanks for checking in, and thanks @rinon for all the work you did to this point.

In #2229 I resolved a number of conflicts and got tests passing with LLVM 9 on macOS and Linux; if you find that branch useful, feel free to do whatever you like with it, including nothing. I am unlikely to work on it further myself.

As far as I am concerned, you can go forth and conquer!

kulp avatar Aug 09 '22 17:08 kulp

Regarding Windows: why does libtooling export so many symbols? This seems like it would be solved with proper use of -fvisibility=hidden. Perhaps bindgen should just statically link to libtooling on Windows?

DemiMarie avatar Aug 10 '22 03:08 DemiMarie

given that we now have the experimental feature we could put this behind it and keep using libclang on windows until we find a good solution. However this would mean that we would be maintaining the new and old clang modules for a while.

pvdrz avatar Feb 10 '23 17:02 pvdrz

given that we now have the experimental feature we could put this behind it and keep using libclang on windows until we find a good solution. However this would mean that we would be maintaining the new and old clang modules for a while.

I agree, putting this behind the experimental will be very helpful.

I finally got around to asking this in Rustlang Zulip. In preliminary messages it is noted that -

Hm... llvm have a script for reducing the number of exported symbols. It was very recently patched to strip out "about 5k symbols" in clang-repl. Not sure if/how that script affects libtooling (I'm not familiar with llvm's build process) but it should be possible to contribute any fixes to llvm. Rust does ship an llvm-tools component though that seems to be for bins atm. I guess in theory we could have an llvm-libs component too but I'm not really the person to ask about that.

I wonder @rinon, if it is worth checking again. Meanwhile, I will see if there is more information on this from Rustlang community.

Update (April 27, 2023): Moved the message on Zulip to t-infra for this question of shipping llvm-libs. https://rust-lang.zulipchat.com/#narrow/stream/242791-t-infra/topic/LLVM.2C.20Libtooling.20DLL.2C.20Windows

amanjeev avatar Mar 09 '23 19:03 amanjeev

This PR should be reopened.

DemiMarie avatar Nov 03 '23 05:11 DemiMarie

@rinon: What is the reason for dynamically linking to libtooling on Windows? To me it seems that the simplest approach is to link statically.

DemiMarie avatar Nov 03 '23 05:11 DemiMarie

Static linking is fine, but the windows LLVM binary distribution didn't include any static libs at the time and they still do not ship any C++ libraries on windows. So the issue comes down to an easily accessible way for users to build bindgen on windows without having to compile Clang and libtooling from sources.

Thinking about this again, it might be viable to ask the LLVM maintainers distributing the windows binaries to include C++ clang static libs? There was also the idea floated of including C++ libtooling as a binary distribution in rustup/toolchain libs.

rinon avatar Nov 04 '23 17:11 rinon

This PR should be reopened.

I suspect (not having talked to @pvdrz though) that the PR was closed automatically by GitHub because the target branch was deleted pursuant to #2286 and #2669.

I think the PR would have to be re-created at this point.

Screenshot 2023-11-04 at 17 35 28

kulp avatar Nov 04 '23 21:11 kulp

Static linking is fine, but the windows LLVM binary distribution didn't include any static libs at the time and they still do not ship any C++ libraries on windows. So the issue comes down to an easily accessible way for users to build bindgen on windows without having to compile Clang and libtooling from sources.

Why is compiling clang and libtooling from source a problem? Could a script that downloads a prebuilt binary be used?

DemiMarie avatar Nov 06 '23 02:11 DemiMarie