reftag in scheme (Was: adding reference tag function)
Hello, This pull request is to add a new ability which generates reference tags. Now, it works in combination with GNU GLOBAL(*1).
*1: GNU GLOBAL (http://www.gnu.org/software/global/) is a source code tagging system. It also works as a tagging framework which offers various functions like follows:
- Project base concept
- High performance & machine architecture independent
- Incremental updating of tag files
- Tag search and path search using regular expression
- ... (please see http://www.gnu.org/software/global/)
By this modification, we become to be able to write a parser which treats reference tags with a little burden. The parser is applicable in both of ctags (definitions) and GNU GLOBAL (definitions and references).
What is included?
The modification includes the following.
-
makeSimpleReferenceTag() function This function makes a reference tag. Now ctags prints it only when the -x option and --gtags option (explained later) are specified. Basically, you may use the function for all symbols except for definitions.
-
Modified scheme parser (parsers/scheme.c) This is an example parser to show the usage of makeSimpleReferenceTag() function.
-
--gtags option and test If this option is specified with the -x option, ctags also prints reference tags as well as definition tags with a type string at the head of each output. The type string is one of the following:
D definition R reference
GNU GLOBAL understand this format since global-6.2.6.
Since modified ctags is upper compatible, if you don't use the --gtags option, its output is same as the original.
How to write a parser using makeSimpleReferenceTag()?
Here is an example in parsers/scheme.c. You can use makeSimpleReferenceTag() instead of makeSimpleTag() for reference tags.
[Tmain/gtags-option.d/input.scm]
1 (define name "name1") ; makeSimpleTag("name", ...)
2 (set! name "name2") ; makeSimpleReferenceTag("name", ...)
(Build of ctags)
You need to use --enable-gtags option for the configure script.
$ ./configure --enable-gtags
$ make
$ sudo make install
Without --gtags, it's same as the original.
$ ctags -x --format=1 input.scm
name 1 input.scm (define name "name1")
With --gtags - also print references tags with a type string
for each output.
$ ctags -x --format=1 --gtags input.scm
D name 1 input.scm (define name "name1")
R name 2 input.scm (set! name "name2")
R unknown 3 input.scm (set! unknown "name3")
How does it work via GNU GLOBAL?
(Build of GNU GLOBAL)
You may use --with-exuberant-ctags option for the configure script.
It's usually recognized automatically though.
$ ./configure --with-exuberant-ctags=/usr/local/bin/ctags
$ make
$ sudo make install
Preparations to use ctags as a plug-in parser of GLOBAL.
$ export GTAGSCONF=/usr/local/share/gtags/gtags.conf
$ export GTAGSLABEL=ctags
Make tag files - gtags invokes ctags internally
$ gtags
Print all definitions ('.*' is a regular expression which means ALL.)
$ global -x '.*' -d
name 1 input.scm (define name "name1")
Print all references to defined symbols
$ global -x '.*' -r
name 2 input.scm (set! name "name2")
Print all references to undefined symbols
$ global -x '.*' -s
unknown 3 input.scm (set! unknown "name3")
I believe this pull request adds new possibility to ctags, and is harmless.
I also believe it brings profit to both of ctags users and GLOBAL users.
Best Regards,
Shigio
http://sourceforge.net/p/ctags/mailman/message/30020186/ When I read this post, I recognized how important ctags is in free software world. I recomend people who is interested in ctags main part, read the post and track this issue.
Before starting working on this patches(I'm bit busy now.) I have two questions.
- Why global doesn't use tags file format? The format is well documented and stable. In other hand, as far as I know -x output is not documented well. I wonder why you didn't propose extending tags format?
- Can you change global side? global depends on ctags in two aspects; command line and file format.
I have been working -X and --list-fields options. -X is extended and customizable version of -x. I desgined these options when I read your great post to sf.net. I hope these are super set of what global needs in ctags.
Here is pseudo output:
$ ./ctags --list-fields
T tag name
F file name
P pattern
R Refernce(or definition)
a Access (or export) of class members
f File-restricted scoping [enabled]
i Inheritance information
k Kind of tag as a single letter [enabled]
K Kind of tag as full name
l Language of source file containing tag
m Implementation information
n Line number of tag definition
s Scope of tag definition [enabled]
S Signature of routine (e.g. prototype or parameter list)
z Include the "kind:" key in kind field
t Type and name of a variable or typedef as "typeref:" field [enabled]
$ ./ctags -X="%R %T %30f %20p" input.scm
D name 1 input.scm (define name "name1")
R name 2 input.scm (set! name "name2")
R unknown 3 input.scm (set! unknown "name3")
$ ./ctags --fields=R -o -
name input.scm /^(define name/;" reference:no
name input.scm /^(set! name/;" reference:yes
unknown input.scm /^(set! unknown/;" reference:yes
global can use either tags output or -x/X optout. The issue is that the command line interface is different from what you wanted originally.
Again, I'm busy now and there are too many issues. However, I will work on this issue. so please, wait.
We have to work on three areas.
- we have to decide what ctags is; whether ctags should record references or not(just definition)? I think such extension is acceptable as far as we can keep consistency of command line interface and tags file format. (As I show in previous post we can keep the consistency.)
- extend the main part. I would like to work on this area. Not so difficult.
- parser, optional. For scheme is submitted by @shigio. I hope a great volunteer work on C:-P If global built-in C parser can handle both reference and definition, adding ability to generate tags file to global is interesting idea. Our ctags has xcmd to utilize it:-P
- Why global doesn't use tags file format?
Because the cxref format was enough, and is available in older ctags too. On the other hand GLOBAL requires a line number for each tag.
Can you change global side?
Of course!
global can use either tags output or -x/X optout.
You are right. It is really great.
The issue is that the command line interface is different from what you wanted originally.
Just no problem.
Thank you for thinking of my suggestion.
@shigio, I have written some important building blocks for implementing "reference tag". I would like to study what is "reference tag" more.
Consider follwoing small C code is given as input for ctags.
#include PRINT(X) printf("%d\n", (X))
struct point {
int x, y;
};
int result;
void len(int a, int b)
{
result = a * a + b * b;
}
void print_length (void)
{
struct point p;
p.x = getInt();
p.y = getInt();
len(p.x, p.y);
PRINT(result);
}
Which one should be reported as "reference tag" ? Could you tell me your idea?
Though I don't know whether it's helpful, I will explain in case of GLOBAL.
(Meaning of mark) [D] is a definition. ([D*] described below) [R] is a reference.
I regarded '#include' as a mistake of '#define'.
#define [D]PRINT([R]X) [R]printf("%d\n", ([R]X))
struct [D]point {
int [D*]x, [D*]y;
};
int [D*]result;
void [D]len(int [D*]a, int [D*]b)
{
[R]result = [R]a * [R]a + [R]b * [R]b;
}
void [D]print_length (void)
{
struct [R]point [D*]p;
[R]p.[R]x = [R]getInt();
[R]p.[R]y = [R]getInt();
[R]len([R]p.[R]x, [R]p.[R]y);
[R]PRINT([R]result);
}
About [D], there seems to be no room of argument.
[Treatment of variable definitions] [D*] should be treated as a definition originally. But in GLOBAL, a variable definition is treated as a reference.
Reason 1: In source code reading, definition of function, macro, typedef, struct, class, enum are important. On the other hand, variable definition is not so important, because there isn't information so much.
Reason 2: It was difficult for me to recognize a variable definition.
However, please note that GLOBAL can't classify the type of symbols, that is, all of function x, macro x, enum x, variable x and struct x are treated as just 'x'. Since ctags can classify them, it may become a different conclusion.
[Treatment of references with no definition] In GLBOAL, references with no definitions are simply treated as references. The followings are relevant to it. o library functions o variables declared implicitly (perl, ruby, python and etc)
Though such symbols are written to the reference tag file (GRTAGS), they are located by '-s (--symbol)' option, not by '-r (--reference)' option.
$ global -x printf # not found by definition search $ global -x printf -r # not found by reference search $ global -x printf -s # found by other symbol search printf 1 main.c #define PRINT(X) printf("%d\n", (X)) $ _
@shigio, thank you. Very informative. I will think more about ctags side specification.
(Private study note).
We can think about a concept, "kinds" in referencing(let's call rkind).
C:
func () {
a = b.d;
foo(&c);
a and c are lvalue reference. b is value reference. How about d? d is value and field references. Who refers it? d is referenced from b. b is referenced from func. foo is funcall reference.
Python:
import x as y from z
x is ??? reference. z is ??? reference. y is not a reference. It defines a name.
C
#undef X
X in "undefine" reference.
Providing kinds for reference may be quite useful for making a upper layer tool. However, writing a parser becomes much harder. I wonder who will implement such complete parsers, especially for languages which have complex syntax.
There will be some common rkinds like value and lvalue. There will be per language own rkinds.
Many interesting things are in rkinds if we have enough time. Should we enter this area? Adding facilities which are not used in any parser may not be good idea. I have added many such facilities. Typical one is cork.
Even about scope, I cannot explain what it is well. More rich model of source must be defined.
Temporarily I should not think about rkind. Instead I can introduce generic rkind called "unknown" or "generic".
When we introduce the reference field, some of existing tags and/or kinds should be marked as reference.
#undef X
X should be makred as reference.
Do I make sense?
I would like to solve https://sourceforge.net/p/ctags/bugs/368/ .
input.h
#define X
#undef X
expected tags (--field=+r, here r mean reference):
X input.h /^#define X$/;" d file:
X input.h /^#undef X$/;" d file: reference:unknown
One of comments may be that "reference" is overkill for solving this. As @b4n wrote in https://github.com/universal-ctags/ctags/pull/221 , introducing another kind like 'u' for undef is enough.
However "reference" tag is too attractive for me.
Implemented. I used 'ref:' as the name for field.
[yamato@x201]~/var/ctags-github% git diff | diffstat
git diff | diffstat
entry.c | 3 +++
entry.h | 1 +
field.c | 3 +++
field.h | 1 +
get.c | 13 +++++++------
5 files changed, 15 insertions(+), 6 deletions(-)
[yamato@x201]~/var/ctags-github% ./ctags --fields='r' -o - cpp.h
./ctags --fields='r' -o - cpp.h
X cpp.h /^#define X$/
X cpp.h /^#undef X$/;" ref:
[yamato@x201]~/var/ctags-github% ./ctags --fields='r' -o - cpp.h
./ctags --fields='r' -o - cpp.h
X cpp.h /^#define X$/
X cpp.h /^#undef X$/;" ref:
[yamato@x201]~/var/ctags-github% git diff main/entry.h
git diff main/entry.h
diff --git a/main/entry.h b/main/entry.h
index bad6196..6f9aa5a 100644
--- a/main/entry.h
+++ b/main/entry.h
@@ -68,6 +68,7 @@ typedef struct sTagEntryInfo {
unsigned int placeholder :1; /* This is just a part of scope context.
Put this entry to cork queue but
don't print it to tags file. */
+ unsigned int referenced :1;
unsigned long lineNumber; /* line number of tag */
const char* pattern; /* pattern for locating source line
http://rigaux.org/language-study/syntax-across-languages/
I think that this is exactly what is proposed in #80. But we still need a temporary memory database to store the tags to make the dual pass work.
@vhda, thank you. I read #80 again. I think we can do something without memory database.
Here we assume tags file format is stable enough(We are working for it now:-P).
In the first pass ctags creates 1st.tags.
Assume ctags cat take hints with arguments like --add-hint-lang:ObjectiveC-kind:class=NSObject.
Consider a command which can generate hints options from a given tags file like:
$ make-hints ./1st.tags > hints.ctags
Now you can ctags as the second pass tags generator like:
$ ctags --options=hints.ctags input-dir
Do I make sense?
Of course it is cool if we can remove these manual steps. However, these manual steps are helpful for use to develop ctags. Other tools than ctags can generate hits.
Do I make sense? Or Don't I misunderstand your idea?
I think that's acceptable as a first approach. If I remember correctly we also discussed a similar solution for my "daemon mode" enhancement proposal. But if we implement this "hints" feature, then we should create a 6.0 release afterwards, as that feature is enough to make every ctags user migrate to u-ctags! ;)
Unfortunately I have zero free time at the moment, so I'm not sure I can help in the implementation of this solution. But if you keep me up to date I can try to help you review it.
( -I option is a kind of the hints option. A script in linux kernel uses -I heavily.)
--xformat now works.
% ./ctags -x --xformat="%n [%10K] %30N --- %C" main/field.h | head
13 [ macro] _FIELD_H --- #define _FIELD_H
18 [ enum] eFieldType --- typedef enum eFieldType { /* extension field content control */
19 [enumerator] FIELD_UNKNOWN --- FIELD_UNKNOWN = -1,
22 [enumerator] FIELD_NAME --- FIELD_NAME,
23 [enumerator] FIELD_SOURCE_FILE --- FIELD_SOURCE_FILE,
24 [enumerator] FIELD_PATTERN --- FIELD_PATTERN,
25 [enumerator] FIELD_COMPACT_SOURCE_LINE --- FIELD_COMPACT_SOURCE_LINE,
28 [enumerator] FIELD_ACCESS --- FIELD_ACCESS,
29 [enumerator] FIELD_FILE_SCOPE --- FIELD_FILE_SCOPE,
30 [enumerator] FIELD_INHERITANCE --- FIELD_INHERITANCE,
I will reserver %r and %R for printing reference information.
One more:
./ctags -x --xformat="(define-%K %N :line %n :file \"%F\")" -R main | head
(define-enum eCharacters :line 52 file: "main/read.h")
(define-enum eCppLimits :line 37 file: "main/get.c")
(define-enum eDebugLevels :line 58 file: "main/debug.h")
...
Random thought: if we used the "normal" tags output format, wouldn't we be able to reuse the already at least part of the available readtags code?
Random thought: if we used the "normal" tags output format, wouldn't we be able to reuse the already at least part of the available readtags code?
I think mine is better. Mine is tightly integrated with field descriptors and is extensible.
The initial version of --xformat option is submitted in #645.
Through implemeint --xformat I have recognized scope, inherits, and typeref
fields are very releated to ref field. I have to study more this area.
@masatake regarding this specific PR, does it make sense to depend on gtags for "reference" identification? Or will we start implementing this as "typeref" in the parsers?
@vhda, sorry to be late. I got the same health trouble again.
I think universal ctags should provide "interface" about reference: the tags format and command line.
About tags format "ref:" field may be needed. However, I'm still thinking what is the best. About the command line, at least ctags should accept --field=+r. More things may be needed. (The biggest issue in this are is the definition of "reference". I'm not sure what is reference and what is not reference yet. )
typeref field is very related to your plan, multi-path parser for multi-file(MPMF). My understanding of MPMF is that typeref and MPMF is for "definition", not "reference". Consider following short C program:
typedef struct point { int x, y } POINT;
POINT p;
$ ./ctags --fields='t' --kinds-c='*-m' -o - input.c
POINT input.c /^typedef struct point { int x, y } POINT;$/;" typeref:struct:point
p input.c /^POINT p;$/
point input.c /^typedef struct point { int x, y } POINT;$/
p doesn't have typeref field. With multipath parser, ctags can generate following tag entry for p:
p input.c /^POINT p;$/ typeref:typedef:POINT
As this example shows MP/MPMF is meaningful on definition tag. It may be meaningful on reference tag, too but you don't have to wait for introducing reference tag releated code for implementing MPMF.
When we introduce the interface for reference, the next step is implementation. My understanding is that ctags is a provider of tags file, and gtags is consumer of tags file. So we should not depend on gtags.
Do I answer to your question well?
@vhda, I think I misunderstood what you wrote. You used "typeref". I guessed you used it as field name. Now I guess you used it as kind name.
Hey @shigio, draft version for gtags support works in my note PC.
yamato@x201]~/var/ctags-github% cat input.sh
source commonFuncs.sh
foo()
{
return 0
}
[yamato@x201]~/var/ctags-github% ./ctags -o - input.sh
foo input.sh /^foo() $/;" f
[yamato@x201]~/var/ctags-github% ./ctags -o - --extra=+r input.sh
commonFuncs.sh input.sh /^source commonFuncs.sh$/;" s
foo input.sh /^foo() $/;" f
[yamato@x201]~/var/ctags-github% ./ctags -o - --extra=+r --fields=+r input.sh
commonFuncs.sh input.sh /^source commonFuncs.sh$/;" s role:generic
foo input.sh /^foo() $/;" f
[yamato@x201]~/var/ctags-github% ./ctags -x --_xformat="%R %-16N %4n %-16F %C" --extra=+r --fields=+r input.sh
D foo 3 input.sh foo()
R commonFuncs.sh 1 input.sh source commonFuncs.sh
commonFuncs.sh is not a defintion. So this should be captured as a reference tag.
Capturing reference tags is activated with --exttra=+r option.
--fields=+r lets ctags print "role:" field. ctags assigns a value for role field only if a tag is a reference.
%R in --_xformat means:
[yamato@x201]~/var/ctags-github% ./ctags --list-fields | grep Marker
R NONE Marker(R or D) representing whether tag is definition or reference format-char off
I introduced new concept "role" instead of rkind.
kind represents what it is. role represents how it is referred.
There are more areas I have to do cleanup but it works anyway. python's namespace, header file in C and undef in C are dealt with reference tags.
#undef X
X is a macro kind tag with undefined or undef role. I will also look at your patch for scheme. The role for Y in following code may be "lvalue":
(set! Y 1)
Good night.
This flexibility is great.
I have added a new test to configure.ac of GLOBAL to detect these options. (I understand that these are alpha version.) You can see it on the repository: [http://cvs.savannah.gnu.org/viewvc/global/global/]
I think this is a big step. Thank you masatake san.
@shigio, I didn't see conigure of GLOBAL yet but, I recommend you to check R in --list-fields output.
This will be the most stable way.
% ./ctags --list-fields | grep ^R
R NONE Marker(R or D) representing whether tag is definition or reference format-char off
I would like to include reference/role feature to the initial release of universal-ctags.
I kept this optn till reviewing/merging scheme related code of the original PR. I will work on this topic after releasing 1.0.0.
os: mac os 10.11.5 universal ctags: installed by brew global: 6.5.4, installed by brew (edit it's formula to add config "--with-universal-ctags")
I tried gun global and universal ctags together with a big project, postgresql.
Run gtags with gtags --gtagslabel=new-ctags.
Then gtags call ctags by (from ps -ef | grep ctags):
ctags --langmap=Asm:.asm.ASM.s.S.A51.29k.29K,Asp:.asp.asa,Awk:.awk.gawk.mawk,Basic:.bas.bi.bb.pb,BETA:.bet,C:.c,C++:.c++.cc.cp.cpp.cxx.h.h++.hh.hp.hpp.hxx.inl,C#:.cs,Cobol:.cbl.cob.CBL.COB,DosBatch:.bat.cmd,Eiffel:.e,Erlang:.erl.ERL.hrl.HRL,Flex:.as.mxml,Fortran:.f.for.ftn.f77.f90.f95.f03.f08.f15,HTML:.htm.html,Java:.java,JavaScript:.js,Lisp:.cl.clisp.el.l.lisp.lsp,Lua:.lua,Make:.mak.mk,MatLab:.m,OCaml:.ml.mli.aug,Pascal:.p.pas,Perl:.pl.pm.plx.perl.ph,PHP:.php.php3.phtml.php4.php5.php7,Python:.py.pyx.pxd.pxi.scons,REXX:.rexx.rx,Ruby:.rb.ruby,Scheme:.SCM.SM.sch.scheme.scm.sm,Sh:.sh.SH.bsh.bash.ksh.zsh.ash,SLang:.sl,SML:.sml.sig,SQL:.sql,Tcl:.tcl.tk.wish.itcl,Tex:.tex,Vera:.vr.vri.vrh,Verilog:.v,VHDL:.vhdl.vhd,Vim:.vim.vba,YACC:.y,Ada:.adb.ads.Ada,Ant:.ant,Clojure:.clj,CoffeeScript:.coffee,CSS:.css,ctags:.ctags,D:.d.di,Diff:.diff.patch,DTS:.dts.dtsi,Falcon:.fal.ftd,gdbinit:.gdb,Go:.go,JSON:.json,m4:.m4.spt,ObjectiveC:.mm,Perl6:.p6.pm6.pl6,R:.r.R.q,reStructuredText:.rest.reST.rst,Rust:.rs,SystemVerilog:.sv.svh.svi,WindRes:.rc,Zephir:.zep --_xformat=%R %-16N %4n %-16F %C --extra=+r --fields=+r -xu --filter --filter-terminator=###terminator###^J
It looks good.
But I can not find reference of simple function with global, for example global -r heap_insert.
@shigio @masatake How is this feather now? Is there something wrong in my operation? Thanks.
Nothing wrong in your operation. Though there is an infrastructure for recording reference tags, only a few parser utilizes it.
Hi @masatake, Please let me ask you a question about the role. "Universal Ctags Documentation Release 0.3.0" says as follows:
A reference tag may have "role" information representing how it is referenced. Universal-ctags prints the role information when the r field is enabled with --fields=+r. If a tag doesn’t have a specialized role, generic is use as the name of role.
How can I realize "a tag doesn't have a specialized role"? It seems that makeSimpleRefTag() requires enabled 'roleIndex' for the fourth argument. What should I set for it?
Or should I always code as follows?
typedef enum {
K_FUNCTION, K_SET
} schemeKind;
/* Added */
typedef enum {
R_SCHEME_GENERIC,
} schemeRole;
/* Added */
static roleDesc SchemeRoles [] = {
{ true, "generic", "generic" },
};
static kindDefinition SchemeKinds [] = {
{ true, 'f', "function", "functions", false},
{ true, 's', "set", "sets",
/* Added */
.referenceOnly = true, ATTACH_ROLES(SchemeRoles)},
};
...
makeSimpleRefTag("aaa", "set", K_SET, R_SCHEME_GENERIC);
If roleIndex == -1 means 'generic', it may be convenient.
// 'roleIndex == -1' means 'generic' role.
makeSimpleRefTag("aaa", "set", K_SET, -1);
What do you think? I hope it does not go against your intentions.
Currently, a parser author must define R_SCHEME_GENERIC. Further more, such role must be defined for each kind.
-1 is reserved as
#define ROLE_INDEX_DEFINITION -1
You can define a macro defining a macro for defining a generic role:
#define defineGenericRole(LANG,Lang,lang,KIND,Kind) \
typedef enum { \
LANG##_##KIND##_GENERIC_ROLE, \
} lang##Kind##Role; \
\
static roleDesc Lang##Kind##Roles [] = { \
{ true, "generic", \
"generic" }, \
}
defineGenericRole(SCHEME,Scheme,scheme,SET,set);