ctags icon indicating copy to clipboard operation
ctags copied to clipboard

main: experimental implementation of multi-pass parsing

Open masatake opened this issue 5 years ago • 8 comments

This pull request introduces --_hint=<tag file> option and internal APIs for utilizing the given tags file. A parser can use the pre-existing tags file for improving the quality of parsing and tagging with the APIs.

This option is not for incremental updating. Even you specify --_hint=<tag file>, ctags parsers all input files.

Python parser is the initial target for applying the APIs. In the first pass, the Python parser attaches "unknown" kind to X in "from Y import X". With the hint, the Python parser can resolve the real kind for X.

masatake avatar Nov 28 '20 22:11 masatake

  • ~https://github.com/universal-ctags/libreadtags/pull/25 must be merged~. (done)
  • ~https://github.com/universal-ctags/libreadtags/issues/22 must be fixed.~ (Solved in different way in https://github.com/universal-ctags/libreadtags/pull/28).

masatake avatar Nov 29 '20 05:11 masatake

Coverage Status

Coverage increased (+0.01%) to 87.037% when pulling 4aef85d631c000457b389c0d3c81c11d3ce6a2f1 on masatake:multi-pass into 09e951352533bacd46e4a745c88ef9eb0a5b997f on universal-ctags:master.

coveralls avatar Nov 29 '20 17:11 coveralls

Codecov Report

Merging #2741 (4aef85d) into master (c436bca) will decrease coverage by 0.43%. The diff coverage is 77.14%.

:exclamation: Current head 4aef85d differs from pull request most recent head 61e5266. Consider uploading reports for the commit 61e5266 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2741      +/-   ##
==========================================
- Coverage   87.38%   86.95%   -0.44%     
==========================================
  Files         199      194       -5     
  Lines       47769    41114    -6655     
==========================================
- Hits        41743    35749    -5994     
+ Misses       6026     5365     -661     
Impacted Files Coverage Δ
main/options.c 83.63% <ø> (-0.41%) :arrow_down:
main/hint.c 54.54% <54.54%> (ø)
parsers/python.c 98.50% <97.22%> (-0.01%) :arrow_down:
extra-cmds/readtags-cmd.c 53.11% <100.00%> (-0.71%) :arrow_down:
main/htable.c 51.18% <0.00%> (-35.07%) :arrow_down:
main/ptrarray.c 56.89% <0.00%> (-28.11%) :arrow_down:
dsl/es.c 44.01% <0.00%> (-10.37%) :arrow_down:
parsers/ada.c 70.94% <0.00%> (-9.50%) :arrow_down:
dsl/dsl.c 75.05% <0.00%> (-6.67%) :arrow_down:
main/mbcs.c 73.17% <0.00%> (-5.10%) :arrow_down:
... and 191 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c436bca...61e5266. Read the comment docs.

codecov[bot] avatar Dec 01 '20 10:12 codecov[bot]

Random ideas:

The option --_hint should be renamed like:

--_hint-file=tagfile: error strict. If an error occurs in calling libreadtags APIs, ctags may stop. --_hint-file-maybe=tagfile: error-tolerant version of --_hint-file. If an error occurs in calling libreadtags APIs, ctags just doesn't use the hint file.

ctags must reject specifying the same file for output and hint. Should I compare their inode numbers? <solve in this PR>

more API for parsers:

bool isHintFileAvaiable (void);

Just after opening a hint file, the main part of ctags should notify it to parsers that have a method <solve in this PR>

void (*preprocessHintFile) (hintFile *file, hintFileInfo *info, langType lang);

This helps a parser build including/included (, require/provide, use/used, or import/package) relation graph before parsing.

How about introducing tags.c, a parser for tags.

./ctags a.tags

ctags parses a.tags. When ctags find F kind entry in a.tags, ctags parses the file tagged with the entry.

The code for realizing the multi-pass parsing and for updating a tags file are strongly related. But how?

Linking libreadtags proposed here is obviously needed for both features. I guess we may need the "filesystem" language that deals with directories as the first-class objects. tagEntryInfos for directories may be an important building block for dealing with "path" like include path, module pat, library path, etc.

For implementing "updating a tags file", I must revise the way of output.

When using a hint file, ctags must compare the options used for generating the hint file and the options just passed from the user. This comparison is much more important when updating a tags file. Should we accept options in the second pass? Saying NO is easy. However, we have to remember a user can have many lines in one's .ctags.

Comparing options is basic infrastructure for running ctags parsers parallel.

To support the other types of hint files, ctags must verify the type of the hint file with filename extension and pattern as ctags does for detecting a suitable parser for an input source file.

If querying hints is not done at hotspots, we can reuse the query engine used in the readtags command.

masatake avatar Dec 27 '20 10:12 masatake

I applied --_hint-file= to our C parser.

$ cat macro.h 
#define DEF(fn, rtype, signature, body)	\
	rtype fn signature BEGIN body END
$ u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o hint.tags macro.h 
$ cat hint.tags
!_TAG_FILE_FORMAT	2	/extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED	1	/0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD	mixed	/number, pattern, mixed, or combineV2/
!_TAG_OUTPUT_FILESEP	slash	/slash or backslash/
!_TAG_OUTPUT_MODE	u-ctags	/u-ctags or e-ctags/
!_TAG_PATTERN_LENGTH_LIMIT	96	/0 for no limit/
!_TAG_PROC_CWD	/home/jet/var/ctags-new/Units/parser-c.r/macrodef-hint-file.d/	//
!_TAG_PROGRAM_AUTHOR	Universal Ctags Team	//
!_TAG_PROGRAM_NAME	Universal Ctags	/Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL	https://ctags.io/	/official site/
!_TAG_PROGRAM_VERSION	5.9.0	/f35a3944/
DEF	macro.h	/^#define DEF(/;"	d	language:C++	signature:(fn,rtype,signature,body)	macrodef:rtype fn signature BEGIN body END
$ cat input.c 
#include "macro.h"

#define BEGIN {
#define END   }

DEF(add2, int, (int a, int b), a + b)
$ u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -o - input.c 
BEGIN	input.c	/^#define BEGIN /;"	d	file:
END	input.c	/^#define END /;"	d	file:
add2	input.c	/^DEF(add2, int, (int a, int b), a + b)$/;"	f	typeref:typename:int
$

I used the ctags command with my experimental patch to make tags for Qemu source code that I'm reading now.

$ time u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o hint.tags  -R
u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o  -R  2.03s user 0.13s system 98% cpu 2.196 total
$ time u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R
u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R  99.66s user 41.11s system 99% cpu 2:21.08 total

About 70 times slower.

masatake avatar Dec 28 '20 22:12 masatake

I implemented negative hint cache.

$ time ~/bin/u-ctags --fields='+{line}' --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R
time ~/bin/u-ctags --fields='+{line}' --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R

real	0m20.691s
user	0m15.114s
sys	0m5.488s

7 times faster than the version without the negative hint cache. However still 10 times slower than running with no-hint.

masatake avatar Dec 29 '20 17:12 masatake

This is related to #1960.

masatake avatar Feb 13 '21 03:02 masatake

https://github.com/universal-ctags/ctags/pull/2741/commits/e5d7cedc9e41aa734dda2d8b60f49ce73bd02db5 must be included in ctags6.

masatake avatar Nov 29 '22 13:11 masatake