main: experimental implementation of multi-pass parsing
This pull request introduces --_hint=<tag file> option and internal APIs for utilizing the given tags file.
A parser can use the pre-existing tags file for improving the quality of parsing and tagging with the APIs.
This option is not for incremental updating.
Even you specify --_hint=<tag file>, ctags parsers all input files.
Python parser is the initial target for applying the APIs. In the first pass, the Python parser attaches "unknown" kind to X in "from Y import X". With the hint, the Python parser can resolve the real kind for X.
- ~https://github.com/universal-ctags/libreadtags/pull/25 must be merged~. (done)
- ~https://github.com/universal-ctags/libreadtags/issues/22 must be fixed.~ (Solved in different way in https://github.com/universal-ctags/libreadtags/pull/28).
Coverage increased (+0.01%) to 87.037% when pulling 4aef85d631c000457b389c0d3c81c11d3ce6a2f1 on masatake:multi-pass into 09e951352533bacd46e4a745c88ef9eb0a5b997f on universal-ctags:master.
Codecov Report
Merging #2741 (4aef85d) into master (c436bca) will decrease coverage by
0.43%. The diff coverage is77.14%.
:exclamation: Current head 4aef85d differs from pull request most recent head 61e5266. Consider uploading reports for the commit 61e5266 to get more accurate results
@@ Coverage Diff @@
## master #2741 +/- ##
==========================================
- Coverage 87.38% 86.95% -0.44%
==========================================
Files 199 194 -5
Lines 47769 41114 -6655
==========================================
- Hits 41743 35749 -5994
+ Misses 6026 5365 -661
| Impacted Files | Coverage Δ | |
|---|---|---|
| main/options.c | 83.63% <ø> (-0.41%) |
:arrow_down: |
| main/hint.c | 54.54% <54.54%> (ø) |
|
| parsers/python.c | 98.50% <97.22%> (-0.01%) |
:arrow_down: |
| extra-cmds/readtags-cmd.c | 53.11% <100.00%> (-0.71%) |
:arrow_down: |
| main/htable.c | 51.18% <0.00%> (-35.07%) |
:arrow_down: |
| main/ptrarray.c | 56.89% <0.00%> (-28.11%) |
:arrow_down: |
| dsl/es.c | 44.01% <0.00%> (-10.37%) |
:arrow_down: |
| parsers/ada.c | 70.94% <0.00%> (-9.50%) |
:arrow_down: |
| dsl/dsl.c | 75.05% <0.00%> (-6.67%) |
:arrow_down: |
| main/mbcs.c | 73.17% <0.00%> (-5.10%) |
:arrow_down: |
| ... and 191 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update c436bca...61e5266. Read the comment docs.
Random ideas:
The option --_hint should be renamed like:
--_hint-file=tagfile: error strict. If an error occurs in calling libreadtags APIs, ctags may stop.
--_hint-file-maybe=tagfile: error-tolerant version of --_hint-file. If an error occurs in calling libreadtags APIs, ctags just doesn't use the hint file.
ctags must reject specifying the same file for output and hint. Should I compare their inode numbers? <solve in this PR>
more API for parsers:
bool isHintFileAvaiable (void);
Just after opening a hint file, the main part of ctags should notify it to parsers that have a method <solve in this PR>
void (*preprocessHintFile) (hintFile *file, hintFileInfo *info, langType lang);
This helps a parser build including/included (, require/provide, use/used, or import/package) relation graph before parsing.
How about introducing tags.c, a parser for tags.
./ctags a.tags
ctags parses a.tags. When ctags find F kind entry in a.tags, ctags parses the file tagged with the entry.
The code for realizing the multi-pass parsing and for updating a tags file are strongly related. But how?
Linking libreadtags proposed here is obviously needed for both features. I guess we may need the "filesystem" language that deals with directories as the first-class objects. tagEntryInfos for directories may be an important building block for dealing with "path" like include path, module pat, library path, etc.
For implementing "updating a tags file", I must revise the way of output.
When using a hint file, ctags must compare the options used for generating the hint file and the options just passed from the user. This comparison is much more important when updating a tags file. Should we accept options in the second pass? Saying NO is easy. However, we have to remember a user can have many lines in one's .ctags.
Comparing options is basic infrastructure for running ctags parsers parallel.
To support the other types of hint files, ctags must verify the type of the hint file with filename extension and pattern as ctags does for detecting a suitable parser for an input source file.
If querying hints is not done at hotspots, we can reuse the query engine used in the readtags command.
I applied --_hint-file= to our C parser.
$ cat macro.h
#define DEF(fn, rtype, signature, body) \
rtype fn signature BEGIN body END
$ u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o hint.tags macro.h
$ cat hint.tags
!_TAG_FILE_FORMAT 2 /extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED 1 /0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD mixed /number, pattern, mixed, or combineV2/
!_TAG_OUTPUT_FILESEP slash /slash or backslash/
!_TAG_OUTPUT_MODE u-ctags /u-ctags or e-ctags/
!_TAG_PATTERN_LENGTH_LIMIT 96 /0 for no limit/
!_TAG_PROC_CWD /home/jet/var/ctags-new/Units/parser-c.r/macrodef-hint-file.d/ //
!_TAG_PROGRAM_AUTHOR Universal Ctags Team //
!_TAG_PROGRAM_NAME Universal Ctags /Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL https://ctags.io/ /official site/
!_TAG_PROGRAM_VERSION 5.9.0 /f35a3944/
DEF macro.h /^#define DEF(/;" d language:C++ signature:(fn,rtype,signature,body) macrodef:rtype fn signature BEGIN body END
$ cat input.c
#include "macro.h"
#define BEGIN {
#define END }
DEF(add2, int, (int a, int b), a + b)
$ u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -o - input.c
BEGIN input.c /^#define BEGIN /;" d file:
END input.c /^#define END /;" d file:
add2 input.c /^DEF(add2, int, (int a, int b), a + b)$/;" f typeref:typename:int
$
I used the ctags command with my experimental patch to make tags for Qemu source code that I'm reading now.
$ time u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o hint.tags -R
u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o -R 2.03s user 0.13s system 98% cpu 2.196 total
$ time u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R
u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R 99.66s user 41.11s system 99% cpu 2:21.08 total
About 70 times slower.
I implemented negative hint cache.
$ time ~/bin/u-ctags --fields='+{line}' --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R
time ~/bin/u-ctags --fields='+{line}' --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R
real 0m20.691s
user 0m15.114s
sys 0m5.488s
7 times faster than the version without the negative hint cache. However still 10 times slower than running with no-hint.
This is related to #1960.
https://github.com/universal-ctags/ctags/pull/2741/commits/e5d7cedc9e41aa734dda2d8b60f49ce73bd02db5 must be included in ctags6.