clang_complete Improve code completion performance

Improve code completion performance

Open vimnut opened this issue 13 years ago • 43 comments

Hello,

First of all, thank you so much for your awesome work on libclang!

I test clang_complete with the clang codebase itself. It's reasonably large, and llvm is its sole dependency. The code completion is working and is very accurate, however it takes about 8 seconds on the first completion, and then 2 to 3 seconds for all subsequent uses.

My question is what can I do to improve the performance?

Thank you.

Jan 20 '11 03:01 vimnut

If you're using svn version of clang, you could try something. On the libclang.py file, line 34, add a fourth argument to the function call, whose value is 0x14. The line should then look like: tu = index.parse(fileName, args, [currentFile], 0x14)

This should enable the use of automatic precompiled header (PCH). However, some bugs may appear, such as not taking new includes into account.

The other solution, is to use hand generated PCH files, however, I don't think it will works with libclang.

Jan 20 '11 20:01 xavierd

Rip-Rip,

I have added the 0x14 argument to index.parse() but it doesn't make any difference at all in my machine.

The other solution, is to use hand generated PCH files, however, I don't think it will works with libclang.

Too bad libclang didn't support custom .pch... Headers from boost, QT, STL do not need to be re-parsed in every editing session.

Thank you for your help.

Jan 21 '11 04:01 vimnut

I am a little confused. are you using the libclang.py code completion? On which hardware? I get a about 1sec for the first and around 0.2 sec for all subsequent completions on LLVM lib/Analysis/ConstantFolding.cpp. I believe you can use PCH with libclang. Did you try to add the relevant command line flags to the user provided flags. However, I do not expect it to be any faster than plain libclang, as it uses internally a quite good caching approach, that automatically creates PCH.

Jan 21 '11 13:01 tobiasgrosser

I am a little confused. are you using the libclang.py code completion? On which hardware? I get a about 1sec for the first and around 0.2 sec for all subsequent completions on LLVM lib/Analysis/ConstantFolding.cpp.

I'm sure I'm using libclang since I didn't set the g:clang_exec option.

My .vimrc:

let g:clang_use_library = 1 let g:clang_library_path = "path to libclang" let g:clang_user_options = '-I/Users/vimnut/llvm/include 2> NUL || exit 0"'

When I try completing a method in ConstantFolding.cpp, the first completion takes 5s, and all subsequent completions take about 1s.

My computer has a 2GHz Core 2 duo with OS X 10.6 as its OS.

Is my computer too slow?

Thank you.

Jan 22 '11 09:01 vimnut

I have the same problem. On my machine, libclang is faster than the clang binary (about 1/4 of the speed), going from about 5 seconds to 1.2 seconds (after the first parse of course).

But if I use the clang binary with a PCH, then I gain a little bit more than libclang, but not much (0.9 seconds instead of 1.2). Adding the -include-pch to g:clang_user_options breaks libclang tho (it cannot parse it).

I wonder if we can gain more speed than that.

The example file I'm doing my tests on is the following:

#include <cstdlib>
#include <iostream>

#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/array.hpp>
#include <boost/signal.hpp>
#include <boost/function.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/noncopyable.hpp>
#include <boost/aligned_storage.hpp>
#include <boost/enable_shared_from_this.hpp>

int main()
{
    boost::
}

Also, Rip-Rip's 0x14 flag trick doens't change anything by me.

Jan 22 '11 18:01 Silex

Hi Silex,

I get for your example about 0.3 sec for the first run and almost no overhead for subsequent completions. Did you both compile clang yourself? Did you perhaps compile with debug mode enabled? My machine is a i5 - 520 @ 2.40GHz.

Cheers Tobi

Jan 22 '11 22:01 tobiasgrosser

I compiled like "cmake -D CMAKE_BUILD_TYPE=Release" followed by "make libclang". I verified and it was compiled using -O3, but maybe I'd try to go the static way as mentionned on the wiki?

I have a Core2Duo @ 2Ghz but even at work where I have a monster it takes time, so I think there is something fishy somewhere. Either your completion results are innacurate (some of the headers aren't parsed) or we build libclang wrong. Could you maybe enhance/correct the step-by-step example I tried to build at https://github.com/Rip-Rip/clang_complete/wiki/Libclang ?

By the way I'm trying to make it so we can tweak how libclang sorts the results (priority or alphabetically) that way it's easier to compare the binary vs libclang output about the completions.

Here is some :profile'ing I did to see that it was really libclang and not some other silly part of the script wasting time, here is the results:

http://ideone.com/7qguh

It's interesting to note that almost half a second is spent between vim & python to pass the full list of results.

Jan 22 '11 23:01 Silex

Btw, maybe we should be more specific about versions:

vim: 7.3 boost: 1.45 cmake: 2.8.3 llvm/clang: latest from svn python: 2.6.6 or 2.7 depending on where I test os: Ubuntu, Windows XP & OSX 10.6

More or less same perfs everywhere.

Jan 22 '11 23:01 Silex

Here are my results on an old C2D 1.8Ghz, with the 0x14 thing: First pass: 1.5s Generate PCH: 6s Completion: 0.18s. Without 0x14: First pass: 1.5s Generate PCH: 1.5s Completion: 1.0s

Plus I don't think PCH files are generated for me (without 0x14), because if I look at libclang output of LIBCLANG_TIMING=1 gvim test.cc, there is no "Precompiling preamble" line...

Jan 23 '11 11:01 xavierd

Ah, LIBCLANG_TIMING... seems we learn new tricks everyday!

@Rip-Rip: do you use the same versions as I do?

Jan 23 '11 14:01 Silex

Ok here is the output with LIBCLANG_TIMING=1:

http://www.ideone.com/vchEJ

There's indeed a difference with the 0x14 flag, but none timing wise. There has to be something obvious we're missing :) I'll try to compile libclang differently.

Edit: can maybe one of you guys upload his "fast" libclang.so so I test it on my machine?

Jan 23 '11 15:01 Silex

@Silex: Yes I use the same versions.

Can you test your exemple with this: https://gist.github.com/758615 It's a very simple program that wrap clang_codeCompleteAt(), it's easier to use that directly modifying clang_complete :).

Do you need a 32bits or a 64bits libclang.so?

Jan 23 '11 18:01 xavierd

Ok, here are my tests with ./complete:

http://ideone.com/icEnH

We see that indeed after the reparse it takes only 0.2203 seconds, so that means the vim equivalent is somewhat broken! That's good news, now we only need to find where :)

Ignore what I said about getting libclang.so, as this test clearly shows this is not the problem.

I suggest you delete the https://github.com/Rip-Rip/clang_complete/wiki/Libclang page or maybe merge it with the main wiki page.

Jan 23 '11 19:01 Silex

HA! I think I discovered what is causing slow perfs. If you only have libclang.so in the lib path, then it's slow. If you have all the other llvm libs then it's fast. My guess is that libclang.so tries to load a cache-helper dll and fails and so reparses everytime when alone, and works as expected (fast) when not alone.

I'll try to figure what dll is wanted, atm I have completions in 0.3 seconds for just libclang and 0.6 seconds within vim (the time to pass the results probably).

Jan 23 '11 20:01 Silex

OMG... as surprising as it is, it's slower because it doens't have.... the clang/2.9/include subdirectory that only contains some .h

I guess libclang parses some of those and eventually this affects the cache.

I think we can close this issue, but we'll need to make the wiki docs better about this.

Jan 23 '11 20:01 Silex

@Silex: It would be great to completely understand this issue. Can you check which include files it is touching. Maybe with some strace command. I do not see an obvious reason it needs those files. Maybe we can fix a problem in libclang

Jan 24 '11 01:01 tobiasgrosser

Ok, did some traceing... basically I can see that pretty much a lot of headers from the dir are read and that it results in a PCH being created to some temporary file, without those headers it looks like the system headers are read instead and that no PCH is created as a result.

If you're interested, take a look at http://unitedsoft.ch/trace.7z

I'll try to see tomorrow if libclang's source has something to say about it. It's probably something pretty explicit in libclang's source.

Jan 24 '11 23:01 Silex

Hum, I see I forgot about this problem.

I think it's safe to say that the original issue (clang_complete's performance) is now addressed with libclang, and for those who insist on using the clang binary they can manually create precompiled header files.

Now about the libclang.so that only creates a cache if it can touch header files in some subdirectory, do you want me to investigate it? It's not really an issue as a workaround was found, and it's more a problem related to clang that'll eventually fix itself. Maybe it's even already fixed, who knows :)

May 01 '11 20:05 Silex

Seems I have the same problem. ./complete results for my system https://gist.github.com/973573 Looks slower then @Silex s (my hardware is core-i3 3GHz, 2Gb RAM) I didn't built clang by hand instead just emerged from portage ~amd64 tree with -debug use flag. Could you tell me more about workaround? Where I can get clang/2.9/include directory?

Thanks.

May 15 '11 21:05 mrsmith

Well try to locate it? on my machine it was in /usr/local/lib/clang/2.9/include, so try something similar on yours :)

Otherwise I could try to upload them somewhere for you to try out, tell me. The workaround is simply to have them accessible in a directory underneath libclang.so, this is what I have:

├── clang
│   └── 2.9
│       └── include
│           ├── altivec.h
│           ├── arm_neon.h
│           ├── avxintrin.h
│           ├── emmintrin.h
│           ├── float.h
│           ├── immintrin.h
│           ├── iso646.h
│           ├── limits.h
│           ├── mm_malloc.h
│           ├── mmintrin.h
│           ├── nmmintrin.h
│           ├── pmmintrin.h
│           ├── smmintrin.h
│           ├── stdarg.h
│           ├── stdbool.h
│           ├── stddef.h
│           ├── stdint.h
│           ├── tgmath.h
│           ├── tmmintrin.h
│           ├── varargs.h
│           ├── wmmintrin.h
│           ├── x86intrin.h
│           └── xmmintrin.h
├── clang.exe
├── libclang.dll
├── libclang.dll.mingw
├── libclang.dll.msvc
├── libclang.dylib
└── libclang.so

Then I just have g:clang_library_path point at the root there.

May 15 '11 22:05 Silex

Ok, found it. I was expected it in /usr/include directory, found in /usr/lib64/clang. So maybe it's the maximum speed for me.

May 16 '11 09:05 mrsmith

So libclang.so has the clang/2.9/include subdirectory with the headers? weird, you'd get more speed than that imho. I could try uploading my version for you to test?

May 16 '11 09:05 Silex

I made a bug report, we'll see how it goes (http://llvm.org/bugs/show_bug.cgi?id=9926).

May 16 '11 10:05 Silex

Ok, we'll see. Tried to put symlink to clang/2.9/include near libclang.so to get structure same as yours, benchmarks results are the same. So I suppose headers and paths work right out of the box for my distro.

I'll try with this flag uncommented flags = TranslationUnit.PrecompiledPreamble | TranslationUnit.CXXPrecompiledPreamble # | TranslationUnit.CacheCompletionResults

May 16 '11 13:05 mrsmith

So if these three flags are passed in to libclang, does that make the PCH section of the help file obsolete?

Aug 28 '11 07:08 exclipy

The PCH section of the help file is useful if you use the clang binary instead of libclang (g:clang_use_library).

Aug 28 '11 10:08 Silex

What do you think of persisting the translation unit AST on disk so we don't have to reparse it if Vim is closed? libclang provides clang_saveTranslationUnit() and clang_createTranslationUnit to save and load translation units.

Perhaps this would be activated if the user supplies a directory to store them. Then you could dump all the TUs in there like how undofile/undodir work.

Oct 15 '11 23:10 exclipy

Hi exclipy,

I think caching stuff on disk might be a solution, but before we go ahead and try this, I believe we should check if we can cache files as soon as they are loaded. This is a lot easier to implement and should in most cases yield the same result. I went ahead and implemented this in my clang_complete branch (https://github.com/tobig/clang_complete). Give it a try and let me know what you think

Oct 17 '11 08:10 tobiasgrosser

this issue seems still to be true -- at least in my world. may this could be mentioned more prominently?

here is the corresponding snippet from my vimrc: "clang... " git clone http://llvm.org/git/llvm.git $HOME/llvm.git " git clone http://llvm.org/git/clang.git $HOME/llvm.git/tools " mkdir -p $HOME/llvm.git/build && cd $HOME/llvm.git/build " ../configure " make -j9 ENABLE_OPTIMIZED=1 DISABLE_ASSERTIONS=1 " nnoremap <F5> :call g:ClangUpdateQuickFix()<CR> let g:clang_periodic_quickfix=1 let g:clang_auto_select=1 if filereadable(expand("~/llvm.git/build/Release/lib/libclang.so")) let g:clang_use_library=1 let g:clang_user_options="-I".expand("~/llvm.git/build/Release/lib/clang/3.1/include") let g:clang_library_path=expand("~/llvm.git/build/Release/lib") endif

and my .clang_complete:

-I$HOME/llvm.install/lib/clang/3.1/include -I/usr/include/qt4/QtCore -I/usr/include/qt4/QtGui -I/usr/include/qt4

Nov 14 '11 08:11 marvin2k

Hi Marvin,

let's get a common base. Can compile ./complete as provided in https://github.com/Rip-Rip/clang_complete/issues/17#issuecomment-700680 and measure the performance of the example in https://github.com/Rip-Rip/clang_complete/issues/17#issuecomment-699077

Call it like this, and check if you get something similar

LIBCLANG_TIMING=1 ./complete test.cpp 16 12 -I/opt/local/include > /dev/null

Parsing test.cpp: 1.4867 (100.0%) 0.2630 (100.0%) 1.7497 (100.0%) 1.9440 (100.0%) Precompiling preamble: 2.3379 (100.0%) 0.5787 (100.0%) 2.9166 (100.0%) 3.1513 (100.0%) Reparsing test.cpp: 2.4849 (100.0%) 0.6816 (100.0%) 3.1665 (100.0%) 3.4026 (100.0%) Code completion @ test.cpp:16:12: 0.1649 (100.0%) 0.0546 (100.0%) 0.2195 (100.0%) 0.2203 (100.0%)

Also, what is the issue you experience. For which location in which testcase do you get what performance. What do you expect instead?

Cheers Tobi

Nov 15 '11 09:11 tobiasgrosser

yes, no problem. compilation:

g++ complete.cc -o complete -lclang -L$HOME/llvm.install/lib -I$HOME/llvm.install/include

tested on 3 systems, each time first with additional includes (working and fast) and then without (works too, but slower). also some system informations.

http://pastebin.com/32fck241 (quadcore 2.8ghz, 6gb ram -- 0.13s vs 0.81s) http://pastebin.com/RKVpjVCz (dualcore 2.66, 4gb ram -- 0.16s vs 0.85s) http://pastebin.com/LHek6JiH (dualcore 1.6, 3gb ram -- 0.27s vs 1.8s)

what more to say?

ah: thanks for the plugin, the first completion for vim which really works ;-)

Nov 15 '11 17:11 marvin2k

Hey marvin2k,

I did not realize you were referring to the include path problem.

I think the include path problem, looks very much like a clang bug. I think after your detailed evaluation, the clang developers should be able to fix it easily. Would you mind creating a bug report at http://www.llvm.org/bugs/. If possible, include the source of the ./complete program, the example c file, the command lines you use, your performance results and everything which you think might useful to reproduce this bug.

Please also copy me in the relevant bug report.

Thanks a lot Tobi

Nov 17 '11 16:11 tobiasgrosser

I already made a bug report a while ago: http://llvm.org/bugs/show_bug.cgi?id=9926

But perhaps it should be improved, feel free to contribute to it.

Nov 17 '11 17:11 Silex

hm, thank for reporting at llvm. I'll keep my mouth shut, this should be enough to point the ones who know to the right direction ;-)

Nov 22 '11 11:11 marvin2k

Well, it'd be nice if you at least said "it happens to me too", because it seems they don't care much about this bug report. Also my data is maybe not really usable...

Nov 22 '11 12:11 Silex

I'm pretty sure I don't have the same root cause for slowness, but I do have very slow completions for the simplest of files. I'm using clang_complete with clang 3.0 built from Homebrew on Lion 10.7.4 and MacVim HEAD linked against Homebrew Python 2.7.3.

After going through the numerous ways of getting timing on the completion parts I've narrowed down the issue to results = filter(lambda x: regexp.match(getAbbr(x.string)), results) in getCurrentCompletions from libclang.py. Prior to this call and after it the whole pipe takes roughly .3s (most of which is a delay after completion results are fetched and waiting for the thread to die). During this call the time jumps to an additional 1.6s.

I broke the call down to the getAbbr call and believe it must be because of marshaling data from C to python in the x.string call. At least that is my assumption from the surface. Is there any way to speed this up?

My simple test file consists of this.

#import <Foundation/Foundation.h>

int main(int argc, char **argv) {
  kCFAllo<^X^U>
}

May 12 '12 11:05 cehoffman

Hi cehoffmann,

I don't have time to work on this, but if you really want to improve performance there are three steps you should take:

Update to clang trunk

There have been a lot of performance changes

Sync the libclang python bindings with the latest bindings included in LLVM trunk

The newer python bindings have added a couple of annotations that I have seen speeding up the completion quite a bit. If this does not help, we could implement this filtering in libclang itself and expose it through the python bindings. This would remove the need for millions of function calls to retrieve all the strings.

Jun 12 '12 14:06 tobiasgrosser

@tobig could you comment on which version of (lib)clang or llvm you are using, and from which source? as mentioned in issue #152 i had severe problems using the current master, or even the release_31...

Jun 18 '12 15:06 marvin2k

Hi Marvin2k,

I always use LLVM trunk and update every couple of weeks. My performance is reasonable. ;-) What are your exact performance problems? As far as I understood, the speed is OK if you added the special include paths. Did I miss something?

Jun 19 '12 08:06 tobiasgrosser

sorry, i meant issue #152 ;-) hm I thought during testing for issue #152 that the include-thingy seemed to be fixed because leaving them didn't change anything performance wise...

so could you post the two commit revs (llvm, clang) you are using in the moment, together with the appropriate configure and make/make install calls? so all steps to reproduce your libclang.so

edit: after sime time, it did work. see my last update in issue #152

Jun 19 '12 15:06 marvin2k

Currently having the same extremely slow completion as well. Completion taking longer than 5 seconds on a Core i7, Windows 8 x64. Compiled Clang svn trunk with VS2012. Python version is 2.7 64-bit. Compiled latest vim from source.

Apr 27 '13 23:04 veegee

On 04/28/2013 01:50 AM, V G wrote:

Currently having the same extremely slow completion as well. Completion taking longer than 5 seconds on a Core i7, Windows 8 x64. Compiled Clang svn trunk with VS2012. Python version is 2.7 64-bit. Compiled latest vim from source.

Could you enable g:clang_debug, restart vim, perform code completion, type ":mess" copy the above clang command to the console and see if it shows any errors.

Thanks, Tobi

Apr 28 '13 00:04 tobiasgrosser

Thanks for the reply. Windows is a huge pain in the ass as always. Turns out clang was compiled in the debug configuration for some reason, even though I set it to do a release build. I rebuilt clang and finally got it to do a release build. Now the completion is working as expected, with completion times under one second. I still don't think full optimization was used, so I'll do a bit more digging. But it seems to be essential to build clang with optimizations to keep completion times bearable.

Apr 28 '13 04:04 veegee

clang_complete clang_complete copied to clipboard

Improve code completion performance

clang_complete
clang_complete copied to clipboard