Subject: Frequent Crashes of clangd with Unidentified Code Patterns
I'm encountering issues with clangd, the Clang Language Server, where it frequently crashes when working with some specific code patterns in my project. I'm not sure what the exact code patterns are causing this issue, as other projects I'm working on seem to work fine. I've attached the clangd-log in hopes that you can help me troubleshoot and resolve this problem. Your assistance would be greatly appreciated. The phenomenon I'm experiencing is that clangd works normally for a short period of time, about ten seconds, every time I open it, and then it crashes, losing all its parsing and jump-to-definition functionalities. This is the first time I've encountered such an issue, and I'm using clangd version 18.1.3. I appreciate your help very much.
clang-log: clangd.log clangd.log
Does this log cover the time when the crash occurred? I'm not seeing anything related to the crash, such as a backtrace, or evidence of clangd restarting, in it.
Also, just as an aside:
I[03:49:52.713] argv[0]: /home/book/.vscode-server/data/User/globalStorage/llvm-vs-code-extensions.vscode-clangd/install/18.1.3/clangd_18.1.3/bin/clangd
I[03:49:52.713] argv[1]: --compile-commands-dir=/home/book/00_linux/05_sr04/build/
I[03:49:52.713] argv[2]: --query-driver=/home/book/100ask_imx6ull-sdk/ToolChain/arm-buildroot-linux-gnueabihf_sdk-buildroot/bin/arm-buildroot-linux-gnueabihf-g++
I[03:49:52.713] argv[3]: --header-insertion=never
I[03:49:52.713] argv[4]: --clang-tidy
I[03:49:52.713] argv[5]: --background-index
I[03:49:52.713] argv[6]: --completion-style=detailed
I[03:49:52.713] argv[7]: --compile-commands-dir=/home/book/00_linux/05_sr04/build/
I[03:49:52.713] argv[8]: --log=verbose
I[03:49:52.713] argv[9]: --background-index
I[03:49:52.713] argv[10]: --all-scopes-completion
I[03:49:52.713] argv[11]: --completion-style=detailed
I[03:49:52.713] argv[12]: --header-insertion=iwyu
I[03:49:52.713] argv[13]: --pch-storage=memory
I[03:49:52.713] argv[14]: --cross-file-rename
I[03:49:52.713] argv[15]: --enable-config
I[03:49:52.713] argv[16]: --fallback-style=WebKit
I[03:49:52.713] argv[17]: --pretty
I[03:49:52.713] argv[18]: --query-driver=clang++
I[03:49:52.713] argv[19]: --query-driver=/home/book/100ask_imx6ull-sdk/ToolChain/arm-buildroot-linux-gnueabihf_sdk-buildroot/bin/arm-buildroot-linux-gnueabihf-gcc
I[03:49:52.713] argv[20]: --query-driver=/usr/bin/gcc
Specifying --query-driver multiple times like this does not have the effect you likely intend: later arguments override earlier arguments.
If you want to allow-list multiple drivers, you have to do it like this:
--query-driver=/home/book/100ask_imx6ull-sdk/ToolChain/arm-buildroot-linux-gnueabihf_sdk-buildroot/bin/arm-buildroot-linux-gnueabihf-g++,/home/book/100ask_imx6ull-sdk/ToolChain/arm-buildroot-linux-gnueabihf_sdk-buildroot/bin/arm-buildroot-linux-gnueabihf-gcc,/usr/bin/gcc,clang++
Note also that you can use globs, so the first two paths could be collapsed into one as:
--query-driver=/home/book/100ask_imx6ull-sdk/ToolChain/arm-buildroot-linux-gnueabihf_sdk-buildroot/bin/arm-buildroot-linux-gnueabihf-g*,/usr/bin/gcc,clang++
To be honest, I don't know whether my clangd has crashed or not, because I haven't found any crash information.
However, as you can see from the image below, clangd seems to be non-functional, and I've been stuck with the issue of code jumping and parsing not working. Here is a new clangd log, and I'm wondering if you can find out the specific problem. It's my first time encountering such a situation. If you can't figure out the cause, would it be possible for you to remotely access my computer to have a look?
clangd.log
Maybe it's a hang rather than a crash?
Can you attach a debugger to the clangd process and get a stack trace to see where it's hanging?
Maybe it's a hang rather than a crash?
Can you attach a debugger to the clangd process and get a stack trace to see where it's hanging?
Could you please tell me how to do this? Do you have a tutorial for it? Or is it possible for you to remotely access my computer to check it?
I used the ps command and it turned out that the clangd process was not suspended. Another strange thing is that header files can all be navigated normally.
book 807 0.0 0.0 1675596 0 pts/0 Sl+ 03:08 0:02 /home/book/.vscode-server/data/User/globalStorage/llvm-vs-code-extensions.vscode-clangd/install/18.1.3/clangd_18.1.3/bin/clangd ... book 96877 0.1 0.0 1591464 13740 pts/0 Sl+ 04:38 0:03 /home/book/.vscode-server/data/User/globalStorage/llvm-vs-code-extensions.vscode-clangd/install/18.1.3/clangd_18.1.3/bin/clangd ... book 149017 0.0 0.0 4032 2020 pts/10 S+ 05:23 0:00 grep --color=auto clangd
It seems I have found the source of the issue, which I'm not sure if it can be categorized as a minor bug in clangd. The cause is most likely due to the use of guards in header files that have led to circular dependencies, and the subsequent compile_commands.json file and clangd are not able to discern this situation, resulting in a deadlock.
I would consider clangd getting into a deadlock on any input (source code) to be a bug.
If you are able to share a code example that reproduces the problem for further investigation, that would be helpful.
I'm very sorry, I misjudged the issue earlier, and the error still persists. I will send you my project package, and I have configured it so you can compile it directly.
The specific steps needed to compile are as follows:
Extract the two compressed files. In the file 05_sr04/src/Makefile, change the line KERN_DIR ?= /home/book/100ask_imx6ull-sdk/Linux-4.9.88 to the absolute path of your extracted Linux-4.9.88 directory. Navigate to the sr04/src/Makefile directory again, and generate the compile_commands.json script using either compiledb make or bear -- make. In driver_drv.c, use any function and attempt to jump to its definition. You will notice that clangd freezes. I hope this helps you reproduce the problem.
You can obtain the code you need through this GitHub link: https://github.com/DYAAL/clangd_for_help Or, you can directly download the code via [email protected]:DYAAL/clangd_for_help.git.
To compile, you also need the relevant toolchains and configurations. Here are the translations of the commands you provided:
export ARCH=arm
export CROSS_COMPILE=arm-buildroot-linux-gnueabihf-
export PATH=$PATH:/to/your/path/of/tool/chain/ToolChain/arm-buildroot-linux-gnueabihf_sdk-buildroot/bin
Currently, clangd is occupying all of my CPU resources, and every time my mouse moves to this position, clang freezes, which is a rather serious bug. However, I'm not sure what is causing it.
After I close the current VSCode,
Thank you for sharing reproduction steps.
However since we're talking about a non-trivial project with toolchain dependencies and such, it would be good to try some quicker diagnostic steps first, as suggested earlier:
Can you attach a debugger to the clangd process and get a stack trace to see where it's hanging?
Thank you for sharing reproduction steps.
However since we're talking about a non-trivial project with toolchain dependencies and such, it would be good to try some quicker diagnotic steps first, as suggested earlier:
Can you attach a debugger to the clangd process and get a stack trace to see where it's hanging?
Could you guide me through the specific steps to do this?
- When the hang occurs, run
topand take note of the PID of the hanging clangd process. - Run
gdb attach <PID>in a terminal - Run
thread apply all backtraceand attach the output here
I followed your instructions and printed out all the debugging information. Unfortunately, I'm not familiar with this area of knowledge, so I don't have the ability to analyze the cause of the error on my own. Upon a brief inspection, it seems to be related to epoll_wait getting stuck.
I've sent you the printed information in hopes that it might be helpful to you. clangd_back_trace.log
I think you may have attached to the vscode process rather than the clangd process, because I'm seeing things related to node in those backtraces (and nothing related to clangd).
I used the command ps -ajx | grep clangd and got the following output:
186910 186956 186817 186817 pts/0 186817 Sl+ 1000 0:00 /home/book/.vscode-server/data/User/globalStorage/llvm-vs-code-extensions.vscode-clangd/install/18.1.3/clangd_18.1.3/bin/clangd --compile-commands-dir=/home/book/00_linux/05_sr04/build/ --query-driver=/home/book/100ask_imx6ull-sdk/ToolChain/arm-buildroot-linux-gnueabihf_sdk-buildroot/bin/arm-buildroot-linux-gnueabihf-g++ --header-insertion=never --clang-tidy --background-index --completion-style=detailed --compile-commands-dir=/home/book/00_linux/05_sr04/build/ --log=verbose --background-index --all-scopes-completion --completion-style=detailed --header-insertion=iwyu --pch-storage=memory --cross-file-rename --enable-config --fallback-style=WebKit --pretty --query-driver=clang++ --query-driver=/home/book/100ask_imx6ull-sdk/ToolChain/arm-buildroot-linux-gnueabihf_sdk-buildroot/bin/arm-buildroot-linux-gnueabihf-gcc --query-driver=/usr/bin/gcc 134987 191654 191653 134987 pts/15 191653 S+ 1000 0:00 grep --color=auto clangd Is 186910 the process ID you are looking for?
I tried to restart the clangd-server and noticed that clangd.main appeared briefly but then exited. I found that the process ID was 194955, which happens to be the parent process of the previous process.
Anyway, I tried to print the information of process 194955. Can you check if this is what you need? clangd_back_trace.log
Yeah, the first number printed by ps (at least with these flags) is the PID of the process's parent, and the second number is the process's own PID. This explains why the first time you attached to the vscode process, since that's what launches the clangd process.
The new backtrace does come from the clangd process. However... are you sure clangd was utilizing CPU at the time you attached the debugger? In this backtrace, all threads are either in std::condition_variable::wait() or, in the case of the main thread, waiting for standard input in read() -- none of them are actually doing any work, so this is not consistent with clangd having a high CPU utilization.
Since the last time, it seems that clangd hasn't been consuming a lot of CPU, but it's still not working. The situation is the same as before. I feel like this is a very rare bug because it's the first time I've encountered it.
Anyway, I later created a new project and migrated the code, and the problem with clangd did not reappear. However, the issue still persists in the previous project.
If you need more information, please contact me, or you can try to reproduce the problem using the code saved on my GitHub (if you think it's necessary). It shouldn't take more than ten minutes of your time.
In any case, thank you very much for your help.
Since the last time, it seems that clangd hasn't been consuming a lot of CPU, but it's still not working.
Could you clarify please the way in which it's not working?
I still don't understand the cause of this issue. I haven't found a solution in the original project, but when I created a new project and migrated very similar code, the issue didn't occur again.
I suspect it might be related to the compile-json process, but I've tried multiple tools to generate the compile-json and still encountered the same situation. I even tried on a different computer, and the issue persisted.
In summary, you can download my code from GitHub, compile it, and reproduce the issue.
The behavior of clangd not working in a specific project is that code navigation, all code suggestions, and code parsing fail to function properly, while clang-tidy still seems to be working as expected.
The behavior of clangd not working in a specific project is that code navigation, all code suggestions, and code parsing fail to function properly, while clang-tidy still seems to be working as expected.
Could you try the following please:
- In the affected worspace, close all open files in the editor
- Restart clangd
- Open a single file in which navigation does not work
- Perform a single navigation action such as go-to-definition
- Attach logs from the above clangd session (starting from clangd startup when it was restarted)
The idea is to see what is happening in the logs at the time the navigation action which does not work is performed. The reason for closing all other files and restarting clangd is to make sure that the logs will only contain information from the affected file, so it's easier to see what's going on.
Here is what I did following your instructions and printed the new clangd log: clangd-log.txt I hope this helps you find the problem.
What I'm seeing in this log is that you are performing go-to-definition on a line that looks like this:
#define MAX_MINOR (10)
with the cursor over the MAX_MINOR token.
Since the token under the cursor names a macro and this is the definition of the macro, clangd returns the same location.
What is unexpected here?
However, at this point, clangd has already frozen, and performing other operations has no response either. For example, at this moment, when I try to jump to the definition of a function called my_open, it remains frozen.
Furthermore, all jump-to-definition and parsing hint functionalities have been lost. clangd-log.txt
Here is the new clangd log.
For example, at this moment, when I try to jump to the definition of a function called my_open, it remains frozen.
Which file and line are you jumping from?
I followed your instructions and opened the current file main_bus.c, specifically the my_open function in the main_bus.c file.
I followed your instructions and opened the current file main_bus.c, specifically the my_open function in the main_bus.c file.
I do see in the log that go-to-definition is invoked on the my_open token on line 18, but this is the same kind of situation as with MAX_MINOR before -- you're already at the function's definition, so "go to definition" just takes you to the same place you're starting from.
So... still not seeing anything that's wrong.
You can test the software I sent you. I believe you should be able to find the problem quickly.