Cxx: extract reference tags
See also https://github.com/universal-ctags/ctags/pull/569.
Signed-off-by: Masatake YAMATO [email protected]
TODO:
- [x] evaluate reference tags on Citre
- [ ] update manual pages
- [ ] merge #569
- [ ] talk to geany people about adding Y to C parser
Codecov Report
Patch coverage: 95.96% and project coverage change: +0.05 :tada:
Comparison is base (
7e5c8dc) 82.83% compared to head (ef7e809) 82.88%.
Additional details and impacted files
@@ Coverage Diff @@
## master #3535 +/- ##
==========================================
+ Coverage 82.83% 82.88% +0.05%
==========================================
Files 226 227 +1
Lines 54754 54870 +116
==========================================
+ Hits 45353 45478 +125
+ Misses 9401 9392 -9
| Impacted Files | Coverage Δ | |
|---|---|---|
| parsers/cxx/cxx_parser_namespace.c | 80.29% <ø> (ø) |
|
| parsers/cxx/cxx_parser_template.c | 74.71% <ø> (ø) |
|
| parsers/cxx/cxx_token_chain.c | 82.41% <ø> (ø) |
|
| parsers/cxx/cxx_reftag.c | 91.17% <91.17%> (ø) |
|
| parsers/cxx/cxx_parser_variable.c | 92.30% <96.42%> (+0.61%) |
:arrow_up: |
| parsers/cxx/cxx_parser.c | 86.60% <100.00%> (ø) |
|
| parsers/cxx/cxx_parser_function.c | 93.09% <100.00%> (+<0.01%) |
:arrow_up: |
| parsers/cxx/cxx_parser_lambda.c | 93.33% <100.00%> (ø) |
|
| parsers/cxx/cxx_parser_tokenizer.c | 92.60% <100.00%> (+0.12%) |
:arrow_up: |
| parsers/cxx/cxx_parser_typedef.c | 92.90% <100.00%> (ø) |
|
| ... and 3 more |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
For the latest linux kernel source, ctags with this pull request generates 8GB tags file. It is not small, however, citre works fine. The option I used:
time ~/bin/ctags -o ~/.citre/upstream-linux.tags --kinds-all='*' --fields='*' --fields-all='*' --extras='*' --extras-all='*' --roles-all.'*'='*' -R ~/var/linux
It takes about 10 minutes for generating a tags file. I used a PC with 256GB RAM + HDD.
Citre works fine but I think we can improve UI for reference tags.
This unknown kind +ref role approach can be applied to all token-based parsers like python and javascript parsers.

- I pressed M-. on a definition, I expected reference tags are listed first.
- I want more emphasized representation for ref tags. Just <R> is not enough.
- in the future, roles must be displayed.
No built-in parser users Y nor W as a kind letter.
Let's use of one them instead of X for representing "unknown" kind.
Citre works fine but I think we can improve UI for reference tags.
Citre now takes a pluggable backend design, citre-find-reference-backends supports the following functionalities:
- xref-find-references
- citre-jump-to-reference
- citre-peek-references, citre-ace-peek-references, citre-peek-through-references
Currently the only "find reference" backend is global backend. I could create a tags backend.
@AmaiKinono Thank you for your response.
I have used this branch + citre (87e2cbf3b2ae6d59ec919a2dcb38e56ccfa5ec14, a bit othreeder) for 3 days to read Linux kernel. M-. becomes slower maybe because readtags reports too many tags. This slowness is acceptable if the command I use is M-?. However, I would like to keep M-. faster.
The question is whether M-. of citre users becomes slower or not if I merge this branch today.
--roles-C.{unknown}=ref (--roles-all.{unknown}=ref, --roles-all.{unknown}=*, or --roles-all.*=*) is needed to make ctags emit the reference tags newly introduced in this pull request. Just adding --extras=+r --fields=+r is not enough. I intentionally made ctgas require the extra option so that users don't make too large tags file unexpectedly.
The instructions (https://github.com/universal-ctags/citre/blob/master/docs/user-manual/about-tags-file.md#create-informative-tags-file) about the way to run ctags, do not enable the "ref" role of "unknown" kind. So I can say merging this branch today doesn't get citre users in trouble.
M-. becomes slower maybe because readtags reports too many tags.
The problem is until now, Citre assumes tags files don't contain "reference tags" in the general sense. u-ctags only emits reference tags that point to external modules/files (let's call them dependency references for now). They are useful and aren't too many, so Citre doesn't filter them out when finding definitions.
But after this PR, uctags emits reference tags pointing to general entities (let's call them general reference for now). Citre should exclude them when finding definitions, and include them when finding references. We could do this in 2 ways:
- Only exclude general reference tags. This has the advantage of keeping M-. behavior of Citre unchanged. We could do this by excluding tags that have "unknown" kind and "ref" role, but if in the future, they are tagged with the correct kind (variable, function, etc.), this technique won't work anymore.
- Exclude all reference tags. In this way, M-. on a module/file name won't list its references. You have to use M-? for that.
Personally I feel the second way is better.
The question is whether M-. of citre users becomes slower or not if I merge this branch today.
It won't become slower as you already pointed out.
P.S. Actually I think tagsfile is not the best format for references. If we create a tagline for every symbol in a codebase, the tagsfile could become larger than the codebase itself. A binary database (like GNU global does) could save much space. But this is only my own thoughts. It may be unrealistic to make necessary changes to uctags for solving this problem.
@AmaiKinono You wrote what I thought but I could not write.
P.S. Actually I think tagsfile is not the best format for references. If we create a tagline for every symbol in a codebase, the tagsfile could become larger than the codebase itself. A binary database (like GNU global does) could save much space. But this is only my own thoughts. It may be unrealistic to make necessary changes to uctags for solving this problem.
I agree with you. The "space" is one of the biggest reasons why I have not implemented reference tags.
let's call them dependency references for now let's call them general reference for now
Good ideas. These proposals help me when extending our man pages.
Python may extract too many 'self'.
EmacsLisp u unknown yes no 0 NONE unknown type of definitions
Go u unknown yes yes 1 NONE unknown
Lisp u unknown yes no 0 NONE unknown type of definitions
Python x unknown yes no 2 NONE name referring a class/variable/function/module defined in other module
Though breaking compatibilities, I will change the letters for unknown to Y.
4615250..fe0c93a can be merged in advance.
As I expected, the new idea, "assigning a cork index to a token", allows us to implement reference tags more than the "unknown" kind.
input1.c:
struct ops file_ops = {
.readfn = file_readfn,
.writefn = file_writefn,
};
output1.tags
ops ... kind:unknown ... roles:vardef
file_ops ... kind:variable ... roles:def ... typeref:struct:ops
readfn ... kind:member ... roles:initialized ... scope:variable:file_ops
file_readfn ... kind:unknown ... roles:value ... scope:variable:file_ops
writefn ... kind:member .... roles:initialized ... scope:variable:file_ops
file_writefn ...kind:unknown ... roles:value ... scope:variable:file_ops
$ cat /tmp/foo.c
#include <stdio.h>
int main(void)
{
return puts("hello, world\n");
}
$ ./ctags --kinds-C=+Y --roles-C.Y=+'{ref}' --extras=+r --fields=+r -o - /tmp/foo.c
main /tmp/foo.c /^int main(void)$/;" f typeref:typename:int roles:def
puts /tmp/foo.c /^ return puts("hello, world\\n");$/;" Y function:main roles:applied
stdio.h /tmp/foo.c /^#include <stdio.h>/;" h roles:system
There are still many TODO items, ctags running locally successfully extracted puts, a macro expanded or a function called in main. See the "roles:" field of "puts". The scope was also filled. As I wrote ago, citre works well.
We should add the rule, "unknown+ref should have lower priority" to Citre's sorting engine.
@AmaiKinono
When I looked up a rather generic name like "inode", M-. took very long time. For looking up reference tags, it is acceptable; I will just do C-g. However, when I was interested only in its definition, taking too long time was not acceptable.
So for evaluation of this pull request, I added some code to Citre:
diff --git a/citre-tags.el b/citre-tags.el
index 4ce793e..bdcc520 100644
--- a/citre-tags.el
+++ b/citre-tags.el
@@ -375,6 +375,7 @@ completion can't be done."
(file-path (citre-get-property 'file-path symbol)))
`(not
(or
+ ,(citre-readtags-filter 'extras "reference" 'csv-contain)
;; Don't excluded "anonymous" here as such symbols can appear in typeref
;; or scope fields of other tags, which may be shown in an xref buffer,
;; so we should be able to find their definitions.
@@ -413,6 +414,40 @@ is found."
:require '(name ext-abspath pattern)
:optional '(ext-kind-full line typeref scope extras))))
+;;;; Find references
+
+(defun citre-tags-reference-default-filter (symbol)
+ (let ((tags-file (citre-get-property 'tags-file symbol))
+ (file-path (citre-get-property 'file-path symbol)))
+ `(not
+ (or
+ ;; Don't excluded "anonymous" here as such symbols can appear in typeref
+ ;; or scope fields of other tags, which may be shown in an xref buffer,
+ ;; so we should be able to find their definitions.
+ ,(if file-path
+ (citre-tags-filter-local-symbol-in-other-file file-path tags-file)
+ 'false)))))
+
+(defvar citre-tags-reference-default-sorter
+ (citre-readtags-sorter
+ citre-tags-sorter-arg-put-references-below
+ 'input '(length name +) 'name
+ citre-tags-sorter-arg-size-order))
+
+(defun citre-tags-get-references (&optional symbol tagsfile)
+ (when-let* ((symbol (or symbol (citre-tags-get-symbol tagsfile)))
+ (tagsfile (or tagsfile (citre-get-property 'tags-file symbol))))
+ (citre-tags-get-tags
+ tagsfile symbol 'exact
+ :filter (or (citre-tags--get-value-in-language-alist
+ :reference-filter symbol)
+ (citre-tags-reference-default-filter symbol))
+ :sorter (or (citre-tags--get-value-in-language-alist
+ :reference-sorter symbol)
+ citre-tags-reference-default-sorter)
+ :require '(name ext-abspath pattern)
+ :optional '(ext-kind-full line typeref scope extras))))
+
;;;; Completion backend
(defvar citre-tags--completion-cache
@@ -483,7 +518,9 @@ The result is a list (BEG END TAGS), see
(citre-tags-get-definitions symbol tagsfile)))
(defvar citre-tags--find-definition-for-id-filter
- `(not ,(citre-readtags-filter 'extras "anonymous" 'csv-contain))
+ `(not (or
+ ,(citre-readtags-filter 'extras "reference" 'csv-contain)
+ ,(citre-readtags-filter 'extras "anonymous" 'csv-contain)))
"Filter for finding definitions when the symbol is inputted by user.")
(defvar citre-tags--id-list-cache
@@ -546,6 +583,16 @@ simple tag name matching. This function is for it."
:identifier-list-func #'citre-tags-get-identifiers
:get-definitions-for-id-func #'citre-tags--get-definition-for-id)
+;;;; Find refernces backend
+(defun citre-tags-get-references-at-point ()
+ "Get definitions of symbol at point."
+ (when-let ((tagsfile (citre-tags-file-path))
+ (symbol (citre-tags-get-symbol)))
+ (citre-tags-get-references symbol tagsfile)))
+
+(citre-register-find-reference-backend
+ 'tags #'citre-tags-get-references-at-point)
+
;;;; Tags in buffer backend
(declare-function tramp-get-remote-tmpdir "tramp" (vec))
So for evaluation of this pull request, I added some code to Citre:
Thanks! This is really helpful.
May I ask you for a PR to Citre, or can I use your code directly?
May I ask you for a PR to Citre, or can I use your code directly?
This is a quick hack. Can you please add a new code for M-?. I'd be happy if my code could be used as a base for the new code.
What I didn't add was code for 'C-u M-?'. I want to have it.
You wrote:
But after this PR, uctags emits reference tags pointing to general entities (let's call them general reference for now). Citre should exclude them when finding definitions, and include them when finding references. We could do this in 2 ways:
Only exclude general reference tags. This has the advantage of keeping M-. behavior of Citre unchanged. We could do this by excluding tags that have "unknown" kind and "ref" role, but if in the future, they are tagged with the correct kind (variable, function, etc.), this technique won't work anymore.
Exclude all reference tags. In this way, M-. on a module/file name won't list its references. You have to use M-? for that.
Personally I feel the second way is better.
I agree with you.
P.S. Actually I think tagsfile is not the best format for references. If we create a tagline for every symbol in a codebase, the tagsfile could become larger than the codebase itself. A binary database (like GNU global does) could save much space. But this is only my own thoughts. It may be unrealistic to make necessary changes to uctags for solving this problem.
I agree with you. However, I would like to use the tags output till I understand what I want.
What I didn't add was code for 'C-u M-?'. I want to have it.
This one is hard.
When finding references of a user-inputted symbol, what xref does is get the symbol by xref-backend-identifier-completion-table, then pass it to xref-backend-references, with the text property stripped.
Citre is an xref backend and itself also has backends. Suppose a Citre backend A can give us all symbols in a project (so it could support xref-backend-identifier-completion-table), but doesn't support finding references, while another Citre backend B supports finding references of a symbol. When I press C-u M-?, Citre tries the backends in turn, and this could happen: we get the symbol name from backend A, and feed it to backend B to find the references. This is wrong, we should get the symbol from backend B.
Before passing a symbol to xref-backend-references, xref removes its text properties, so we could not know which Citre backend gives us the symbol. This makes designing a "xref backend with multiple backends" (like Citre) very hard.
This may be solved by some jump wire code but it needs some design... We are too away from the theme of this PR. Now I know what you want and I'll see how to implement it ;)