ctags icon indicating copy to clipboard operation
ctags copied to clipboard

reftag in scheme (Was: adding reference tag function)

Open shigio opened this issue 10 years ago • 33 comments

Hello, This pull request is to add a new ability which generates reference tags. Now, it works in combination with GNU GLOBAL(*1).

*1: GNU GLOBAL (http://www.gnu.org/software/global/) is a source code tagging system. It also works as a tagging framework which offers various functions like follows:

  • Project base concept
  • High performance & machine architecture independent
  • Incremental updating of tag files
  • Tag search and path search using regular expression
  • ... (please see http://www.gnu.org/software/global/)

By this modification, we become to be able to write a parser which treats reference tags with a little burden. The parser is applicable in both of ctags (definitions) and GNU GLOBAL (definitions and references).

What is included?

The modification includes the following.

  • makeSimpleReferenceTag() function This function makes a reference tag. Now ctags prints it only when the -x option and --gtags option (explained later) are specified. Basically, you may use the function for all symbols except for definitions.

  • Modified scheme parser (parsers/scheme.c) This is an example parser to show the usage of makeSimpleReferenceTag() function.

  • --gtags option and test If this option is specified with the -x option, ctags also prints reference tags as well as definition tags with a type string at the head of each output. The type string is one of the following:

    D definition R reference

    GNU GLOBAL understand this format since global-6.2.6.

Since modified ctags is upper compatible, if you don't use the --gtags option, its output is same as the original.

How to write a parser using makeSimpleReferenceTag()?

Here is an example in parsers/scheme.c. You can use makeSimpleReferenceTag() instead of makeSimpleTag() for reference tags.

[Tmain/gtags-option.d/input.scm]

1   (define name "name1")       ; makeSimpleTag("name", ...)
2   (set! name "name2")     ; makeSimpleReferenceTag("name", ...)     

(Build of ctags) You need to use --enable-gtags option for the configure script.

$ ./configure --enable-gtags
$ make
$ sudo make install

Without --gtags, it's same as the original.

$ ctags -x --format=1 input.scm
name                1 input.scm        (define name "name1")

With --gtags - also print references tags with a type string for each output.

$ ctags -x --format=1 --gtags input.scm
D name                1 input.scm        (define name "name1")
R name                2 input.scm        (set! name "name2")
R unknown             3 input.scm        (set! unknown "name3")

How does it work via GNU GLOBAL?

(Build of GNU GLOBAL) You may use --with-exuberant-ctags option for the configure script. It's usually recognized automatically though.

$ ./configure --with-exuberant-ctags=/usr/local/bin/ctags
$ make 
$ sudo make install

Preparations to use ctags as a plug-in parser of GLOBAL.

$ export GTAGSCONF=/usr/local/share/gtags/gtags.conf
$ export GTAGSLABEL=ctags

Make tag files - gtags invokes ctags internally

$ gtags

Print all definitions ('.*' is a regular expression which means ALL.)

$ global -x '.*' -d
name                1 input.scm        (define name "name1")

Print all references to defined symbols

$ global -x '.*' -r
name                2 input.scm        (set! name "name2")

Print all references to undefined symbols

$ global -x '.*' -s
unknown             3 input.scm        (set! unknown "name3")

I believe this pull request adds new possibility to ctags, and is harmless. I also believe it brings profit to both of ctags users and GLOBAL users.

Best Regards, Shigio

shigio avatar Sep 22 '15 16:09 shigio

http://sourceforge.net/p/ctags/mailman/message/30020186/ When I read this post, I recognized how important ctags is in free software world. I recomend people who is interested in ctags main part, read the post and track this issue.

Before starting working on this patches(I'm bit busy now.) I have two questions.

  1. Why global doesn't use tags file format? The format is well documented and stable. In other hand, as far as I know -x output is not documented well. I wonder why you didn't propose extending tags format?
  2. Can you change global side? global depends on ctags in two aspects; command line and file format.

I have been working -X and --list-fields options. -X is extended and customizable version of -x. I desgined these options when I read your great post to sf.net. I hope these are super set of what global needs in ctags.

Here is pseudo output:

$ ./ctags --list-fields
T   tag name
F   file name
P   pattern
R   Refernce(or definition)
a   Access (or export) of class members
f   File-restricted scoping [enabled]
i   Inheritance information
k   Kind of tag as a single letter [enabled]
K   Kind of tag as full name
l   Language of source file containing tag
m   Implementation information
n   Line number of tag definition
s   Scope of tag definition [enabled]
S   Signature of routine (e.g. prototype or parameter list)
z   Include the "kind:" key in kind field
t   Type and name of a variable or typedef as "typeref:" field [enabled]

$ ./ctags -X="%R %T %30f    %20p" input.scm
D name                1 input.scm        (define name "name1")
R name                2 input.scm        (set! name "name2")
R unknown             3 input.scm        (set! unknown "name3")

$ ./ctags --fields=R -o -
name    input.scm   /^(define name/;"   reference:no
name    input.scm   /^(set! name/;" reference:yes
unknown input.scm   /^(set! unknown/;"  reference:yes

global can use either tags output or -x/X optout. The issue is that the command line interface is different from what you wanted originally.

Again, I'm busy now and there are too many issues. However, I will work on this issue. so please, wait.

masatake avatar Sep 24 '15 06:09 masatake

We have to work on three areas.

  1. we have to decide what ctags is; whether ctags should record references or not(just definition)? I think such extension is acceptable as far as we can keep consistency of command line interface and tags file format. (As I show in previous post we can keep the consistency.)
  2. extend the main part. I would like to work on this area. Not so difficult.
  3. parser, optional. For scheme is submitted by @shigio. I hope a great volunteer work on C:-P If global built-in C parser can handle both reference and definition, adding ability to generate tags file to global is interesting idea. Our ctags has xcmd to utilize it:-P

masatake avatar Sep 24 '15 07:09 masatake

  1. Why global doesn't use tags file format?

Because the cxref format was enough, and is available in older ctags too. On the other hand GLOBAL requires a line number for each tag.

Can you change global side?

Of course!

global can use either tags output or -x/X optout.

You are right. It is really great.

The issue is that the command line interface is different from what you wanted originally.

Just no problem.

Thank you for thinking of my suggestion.

shigio avatar Sep 24 '15 09:09 shigio

@shigio, I have written some important building blocks for implementing "reference tag". I would like to study what is "reference tag" more.

Consider follwoing small C code is given as input for ctags.

#include PRINT(X) printf("%d\n", (X))

struct point {
  int x, y;
};


int result;
void len(int a, int b)
{
  result =  a * a + b * b;
}

void print_length (void)
{
  struct point p;
  p.x = getInt();
  p.y = getInt();

  len(p.x, p.y);
  PRINT(result);
}

Which one should be reported as "reference tag" ? Could you tell me your idea?

masatake avatar Oct 06 '15 16:10 masatake

Though I don't know whether it's helpful, I will explain in case of GLOBAL.

(Meaning of mark) [D] is a definition. ([D*] described below) [R] is a reference.

I regarded '#include' as a mistake of '#define'.


#define [D]PRINT([R]X) [R]printf("%d\n", ([R]X))
struct [D]point {
  int [D*]x, [D*]y;
};
int [D*]result;
void [D]len(int [D*]a, int [D*]b)
{
  [R]result =  [R]a * [R]a + [R]b * [R]b;
}
void [D]print_length (void)
{
  struct [R]point [D*]p;
  [R]p.[R]x = [R]getInt();
  [R]p.[R]y = [R]getInt();
  [R]len([R]p.[R]x, [R]p.[R]y);
  [R]PRINT([R]result);
}

About [D], there seems to be no room of argument.

[Treatment of variable definitions] [D*] should be treated as a definition originally. But in GLOBAL, a variable definition is treated as a reference.

Reason 1: In source code reading, definition of function, macro, typedef, struct, class, enum are important. On the other hand, variable definition is not so important, because there isn't information so much.

Reason 2: It was difficult for me to recognize a variable definition.

However, please note that GLOBAL can't classify the type of symbols, that is, all of function x, macro x, enum x, variable x and struct x are treated as just 'x'. Since ctags can classify them, it may become a different conclusion.

[Treatment of references with no definition] In GLBOAL, references with no definitions are simply treated as references. The followings are relevant to it. o library functions o variables declared implicitly (perl, ruby, python and etc)

Though such symbols are written to the reference tag file (GRTAGS), they are located by '-s (--symbol)' option, not by '-r (--reference)' option.

$ global -x printf # not found by definition search $ global -x printf -r # not found by reference search $ global -x printf -s # found by other symbol search printf 1 main.c #define PRINT(X) printf("%d\n", (X)) $ _

shigio avatar Oct 07 '15 09:10 shigio

@shigio, thank you. Very informative. I will think more about ctags side specification.

masatake avatar Oct 08 '15 12:10 masatake

(Private study note).

We can think about a concept, "kinds" in referencing(let's call rkind).

C:

func () {
   a = b.d;
   foo(&c);

a and c are lvalue reference. b is value reference. How about d? d is value and field references. Who refers it? d is referenced from b. b is referenced from func. foo is funcall reference.

Python:

   import x as y from z

x is ??? reference. z is ??? reference. y is not a reference. It defines a name.

C

#undef X

X in "undefine" reference.

Providing kinds for reference may be quite useful for making a upper layer tool. However, writing a parser becomes much harder. I wonder who will implement such complete parsers, especially for languages which have complex syntax.

There will be some common rkinds like value and lvalue. There will be per language own rkinds.

Many interesting things are in rkinds if we have enough time. Should we enter this area? Adding facilities which are not used in any parser may not be good idea. I have added many such facilities. Typical one is cork.

Even about scope, I cannot explain what it is well. More rich model of source must be defined.

masatake avatar Oct 21 '15 09:10 masatake

Temporarily I should not think about rkind. Instead I can introduce generic rkind called "unknown" or "generic".

When we introduce the reference field, some of existing tags and/or kinds should be marked as reference.

#undef X

X should be makred as reference.

Do I make sense?

I would like to solve https://sourceforge.net/p/ctags/bugs/368/ .

input.h

#define X
#undef X

expected tags (--field=+r, here r mean reference):

X input.h /^#define X$/;" d file:
X input.h /^#undef X$/;" d file: reference:unknown

One of comments may be that "reference" is overkill for solving this. As @b4n wrote in https://github.com/universal-ctags/ctags/pull/221 , introducing another kind like 'u' for undef is enough.

However "reference" tag is too attractive for me.

masatake avatar Oct 21 '15 10:10 masatake

Implemented. I used 'ref:' as the name for field.

[yamato@x201]~/var/ctags-github% git diff | diffstat
git diff | diffstat
 entry.c |    3 +++
 entry.h |    1 +
 field.c |    3 +++
 field.h |    1 +
 get.c   |   13 +++++++------
 5 files changed, 15 insertions(+), 6 deletions(-)
[yamato@x201]~/var/ctags-github% ./ctags --fields='r' -o - cpp.h
./ctags --fields='r' -o - cpp.h
X   cpp.h   /^#define X$/
X   cpp.h   /^#undef X$/;"  ref:
[yamato@x201]~/var/ctags-github% ./ctags --fields='r' -o - cpp.h
./ctags --fields='r' -o - cpp.h
X   cpp.h   /^#define X$/
X   cpp.h   /^#undef X$/;"  ref:
[yamato@x201]~/var/ctags-github% git diff main/entry.h
git diff main/entry.h
diff --git a/main/entry.h b/main/entry.h
index bad6196..6f9aa5a 100644
--- a/main/entry.h
+++ b/main/entry.h
@@ -68,6 +68,7 @@ typedef struct sTagEntryInfo {
    unsigned int placeholder    :1;  /* This is just a part of scope context.
                        Put this entry to cork queue but
                        don't print it to tags file. */
+   unsigned int referenced     :1;

    unsigned long lineNumber;     /* line number of tag */
    const char* pattern;          /* pattern for locating source line

masatake avatar Oct 22 '15 07:10 masatake

http://rigaux.org/language-study/syntax-across-languages/

masatake avatar Oct 22 '15 08:10 masatake

I think that this is exactly what is proposed in #80. But we still need a temporary memory database to store the tags to make the dual pass work.

vhda avatar Oct 22 '15 14:10 vhda

@vhda, thank you. I read #80 again. I think we can do something without memory database. Here we assume tags file format is stable enough(We are working for it now:-P). In the first pass ctags creates 1st.tags. Assume ctags cat take hints with arguments like --add-hint-lang:ObjectiveC-kind:class=NSObject. Consider a command which can generate hints options from a given tags file like:

   $ make-hints ./1st.tags > hints.ctags

Now you can ctags as the second pass tags generator like:

  $ ctags --options=hints.ctags input-dir

Do I make sense?

Of course it is cool if we can remove these manual steps. However, these manual steps are helpful for use to develop ctags. Other tools than ctags can generate hits.

Do I make sense? Or Don't I misunderstand your idea?

masatake avatar Oct 22 '15 15:10 masatake

I think that's acceptable as a first approach. If I remember correctly we also discussed a similar solution for my "daemon mode" enhancement proposal. But if we implement this "hints" feature, then we should create a 6.0 release afterwards, as that feature is enough to make every ctags user migrate to u-ctags! ;)

Unfortunately I have zero free time at the moment, so I'm not sure I can help in the implementation of this solution. But if you keep me up to date I can try to help you review it.

vhda avatar Oct 22 '15 16:10 vhda

( -I option is a kind of the hints option. A script in linux kernel uses -I heavily.)

masatake avatar Oct 23 '15 01:10 masatake

--xformat now works.

% ./ctags -x --xformat="%n [%10K] %30N --- %C" main/field.h | head
13 [     macro]                       _FIELD_H --- #define _FIELD_H
18 [      enum]                     eFieldType --- typedef enum eFieldType { /* extension field content control */
19 [enumerator]                  FIELD_UNKNOWN --- FIELD_UNKNOWN = -1,
22 [enumerator]                     FIELD_NAME --- FIELD_NAME,
23 [enumerator]              FIELD_SOURCE_FILE --- FIELD_SOURCE_FILE,
24 [enumerator]                  FIELD_PATTERN --- FIELD_PATTERN,
25 [enumerator]      FIELD_COMPACT_SOURCE_LINE --- FIELD_COMPACT_SOURCE_LINE,
28 [enumerator]                   FIELD_ACCESS --- FIELD_ACCESS,
29 [enumerator]               FIELD_FILE_SCOPE --- FIELD_FILE_SCOPE,
30 [enumerator]              FIELD_INHERITANCE --- FIELD_INHERITANCE,

I will reserver %r and %R for printing reference information.

masatake avatar Oct 26 '15 19:10 masatake

One more:

./ctags -x --xformat="(define-%K %N :line %n :file \"%F\")" -R main | head 
(define-enum eCharacters :line 52 file: "main/read.h")
(define-enum eCppLimits :line 37 file: "main/get.c")
(define-enum eDebugLevels :line 58 file: "main/debug.h")
...

masatake avatar Oct 26 '15 19:10 masatake

Random thought: if we used the "normal" tags output format, wouldn't we be able to reuse the already at least part of the available readtags code?

vhda avatar Oct 26 '15 19:10 vhda

Random thought: if we used the "normal" tags output format, wouldn't we be able to reuse the already at least part of the available readtags code?

I think mine is better. Mine is tightly integrated with field descriptors and is extensible.

masatake avatar Oct 27 '15 05:10 masatake

The initial version of --xformat option is submitted in #645. Through implemeint --xformat I have recognized scope, inherits, and typeref fields are very releated to ref field. I have to study more this area.

masatake avatar Oct 28 '15 17:10 masatake

@masatake regarding this specific PR, does it make sense to depend on gtags for "reference" identification? Or will we start implementing this as "typeref" in the parsers?

vhda avatar Nov 04 '15 14:11 vhda

@vhda, sorry to be late. I got the same health trouble again.

I think universal ctags should provide "interface" about reference: the tags format and command line.

About tags format "ref:" field may be needed. However, I'm still thinking what is the best. About the command line, at least ctags should accept --field=+r. More things may be needed. (The biggest issue in this are is the definition of "reference". I'm not sure what is reference and what is not reference yet. )

typeref field is very related to your plan, multi-path parser for multi-file(MPMF). My understanding of MPMF is that typeref and MPMF is for "definition", not "reference". Consider following short C program:

typedef struct point { int x, y } POINT;
POINT p;
$ ./ctags --fields='t' --kinds-c='*-m' -o -  input.c
POINT   input.c /^typedef struct point { int x, y } POINT;$/;"  typeref:struct:point
p   input.c /^POINT p;$/
point   input.c /^typedef struct point { int x, y } POINT;$/

p doesn't have typeref field. With multipath parser, ctags can generate following tag entry for p:

p   input.c /^POINT p;$/     typeref:typedef:POINT

As this example shows MP/MPMF is meaningful on definition tag. It may be meaningful on reference tag, too but you don't have to wait for introducing reference tag releated code for implementing MPMF.

When we introduce the interface for reference, the next step is implementation. My understanding is that ctags is a provider of tags file, and gtags is consumer of tags file. So we should not depend on gtags.

Do I answer to your question well?

masatake avatar Nov 07 '15 20:11 masatake

@vhda, I think I misunderstood what you wrote. You used "typeref". I guessed you used it as field name. Now I guess you used it as kind name.

masatake avatar Nov 10 '15 06:11 masatake

Hey @shigio, draft version for gtags support works in my note PC.

yamato@x201]~/var/ctags-github% cat input.sh                                                                       
source commonFuncs.sh

foo() 
{
    return 0
}

[yamato@x201]~/var/ctags-github% ./ctags -o - input.sh                                                              
foo input.sh    /^foo() $/;"    f
[yamato@x201]~/var/ctags-github% ./ctags -o - --extra=+r input.sh                                                   
commonFuncs.sh  input.sh    /^source commonFuncs.sh$/;" s
foo input.sh    /^foo() $/;"    f
[yamato@x201]~/var/ctags-github% ./ctags -o - --extra=+r --fields=+r input.sh
commonFuncs.sh  input.sh    /^source commonFuncs.sh$/;" s   role:generic
foo input.sh    /^foo() $/;"    f
[yamato@x201]~/var/ctags-github% ./ctags -x --_xformat="%R %-16N %4n %-16F %C" --extra=+r --fields=+r input.sh
D foo                 3 input.sh         foo() 
R commonFuncs.sh      1 input.sh         source commonFuncs.sh

commonFuncs.sh is not a defintion. So this should be captured as a reference tag. Capturing reference tags is activated with --exttra=+r option. --fields=+r lets ctags print "role:" field. ctags assigns a value for role field only if a tag is a reference. %R in --_xformat means:

[yamato@x201]~/var/ctags-github% ./ctags --list-fields | grep Marker
R   NONE    Marker(R or D) representing whether tag is definition or reference  format-char off

I introduced new concept "role" instead of rkind.

kind represents what it is. role represents how it is referred.

There are more areas I have to do cleanup but it works anyway. python's namespace, header file in C and undef in C are dealt with reference tags.

#undef X

X is a macro kind tag with undefined or undef role. I will also look at your patch for scheme. The role for Y in following code may be "lvalue":

(set! Y 1)

Good night.

masatake avatar Nov 11 '15 18:11 masatake

This flexibility is great.

I have added a new test to configure.ac of GLOBAL to detect these options. (I understand that these are alpha version.) You can see it on the repository: [http://cvs.savannah.gnu.org/viewvc/global/global/]

I think this is a big step. Thank you masatake san.

shigio avatar Nov 13 '15 02:11 shigio

@shigio, I didn't see conigure of GLOBAL yet but, I recommend you to check R in --list-fields output. This will be the most stable way.

% ./ctags --list-fields | grep ^R
R   NONE    Marker(R or D) representing whether tag is definition or reference  format-char off

I would like to include reference/role feature to the initial release of universal-ctags.

masatake avatar Nov 15 '15 11:11 masatake

I kept this optn till reviewing/merging scheme related code of the original PR. I will work on this topic after releasing 1.0.0.

masatake avatar Nov 24 '15 03:11 masatake

os: mac os 10.11.5 universal ctags: installed by brew global: 6.5.4, installed by brew (edit it's formula to add config "--with-universal-ctags")

I tried gun global and universal ctags together with a big project, postgresql. Run gtags with gtags --gtagslabel=new-ctags.

Then gtags call ctags by (from ps -ef | grep ctags): ctags --langmap=Asm:.asm.ASM.s.S.A51.29k.29K,Asp:.asp.asa,Awk:.awk.gawk.mawk,Basic:.bas.bi.bb.pb,BETA:.bet,C:.c,C++:.c++.cc.cp.cpp.cxx.h.h++.hh.hp.hpp.hxx.inl,C#:.cs,Cobol:.cbl.cob.CBL.COB,DosBatch:.bat.cmd,Eiffel:.e,Erlang:.erl.ERL.hrl.HRL,Flex:.as.mxml,Fortran:.f.for.ftn.f77.f90.f95.f03.f08.f15,HTML:.htm.html,Java:.java,JavaScript:.js,Lisp:.cl.clisp.el.l.lisp.lsp,Lua:.lua,Make:.mak.mk,MatLab:.m,OCaml:.ml.mli.aug,Pascal:.p.pas,Perl:.pl.pm.plx.perl.ph,PHP:.php.php3.phtml.php4.php5.php7,Python:.py.pyx.pxd.pxi.scons,REXX:.rexx.rx,Ruby:.rb.ruby,Scheme:.SCM.SM.sch.scheme.scm.sm,Sh:.sh.SH.bsh.bash.ksh.zsh.ash,SLang:.sl,SML:.sml.sig,SQL:.sql,Tcl:.tcl.tk.wish.itcl,Tex:.tex,Vera:.vr.vri.vrh,Verilog:.v,VHDL:.vhdl.vhd,Vim:.vim.vba,YACC:.y,Ada:.adb.ads.Ada,Ant:.ant,Clojure:.clj,CoffeeScript:.coffee,CSS:.css,ctags:.ctags,D:.d.di,Diff:.diff.patch,DTS:.dts.dtsi,Falcon:.fal.ftd,gdbinit:.gdb,Go:.go,JSON:.json,m4:.m4.spt,ObjectiveC:.mm,Perl6:.p6.pm6.pl6,R:.r.R.q,reStructuredText:.rest.reST.rst,Rust:.rs,SystemVerilog:.sv.svh.svi,WindRes:.rc,Zephir:.zep --_xformat=%R %-16N %4n %-16F %C --extra=+r --fields=+r -xu --filter --filter-terminator=###terminator###^J

It looks good. But I can not find reference of simple function with global, for example global -r heap_insert.

@shigio @masatake How is this feather now? Is there something wrong in my operation? Thanks.

ppggff avatar Apr 20 '16 09:04 ppggff

Nothing wrong in your operation. Though there is an infrastructure for recording reference tags, only a few parser utilizes it.

masatake avatar Apr 20 '16 13:04 masatake

Hi @masatake, Please let me ask you a question about the role. "Universal Ctags Documentation Release 0.3.0" says as follows:

A reference tag may have "role" information representing how it is referenced. Universal-ctags prints the role information when the r field is enabled with --fields=+r. If a tag doesn’t have a specialized role, generic is use as the name of role.

How can I realize "a tag doesn't have a specialized role"? It seems that makeSimpleRefTag() requires enabled 'roleIndex' for the fourth argument. What should I set for it?

Or should I always code as follows?

typedef enum {
        K_FUNCTION, K_SET
} schemeKind;
/* Added */
typedef enum {
        R_SCHEME_GENERIC,
} schemeRole;
/* Added */
static roleDesc SchemeRoles [] = {
        { true, "generic", "generic" },
};
static kindDefinition SchemeKinds [] = {
        { true, 'f', "function", "functions", false},
        { true, 's', "set",      "sets",
        /* Added */
          .referenceOnly = true, ATTACH_ROLES(SchemeRoles)},
};
...
makeSimpleRefTag("aaa", "set", K_SET, R_SCHEME_GENERIC);

If roleIndex == -1 means 'generic', it may be convenient.

// 'roleIndex == -1' means 'generic' role.
makeSimpleRefTag("aaa", "set", K_SET, -1);

What do you think? I hope it does not go against your intentions.

shigio avatar Nov 02 '17 01:11 shigio

Currently, a parser author must define R_SCHEME_GENERIC. Further more, such role must be defined for each kind.

-1 is reserved as

#define ROLE_INDEX_DEFINITION -1

You can define a macro defining a macro for defining a generic role:

#define defineGenericRole(LANG,Lang,lang,KIND,Kind) \
typedef enum {					       \
	LANG##_##KIND##_GENERIC_ROLE,		       \
} lang##Kind##Role;				       \
						       \
static roleDesc Lang##Kind##Roles [] = {	       \
	{ true, "generic",			       \
	  "generic" },				       \
}

     defineGenericRole(SCHEME,Scheme,scheme,SET,set);

masatake avatar Nov 02 '17 02:11 masatake