ctags Merge support for the Vala language

The anjuta project has its own fork of Ctags [2] and it does provide support for vala via some patches [1] that would be nice to integrate into universal-ctags.

[1] http://abderrahim.arablug.org/GSoC2008/ctags-vala.diff [2] https://git.gnome.org/browse/anjuta/tree/plugins/symbol-db/anjuta-tags

Oct 10 '15 13:10 LemonBoy

Thank you for requesting.

We already have known it(https://github.com/universal-ctags/ctags/blob/master/docs/tracking.rst).

I studied more.

The license is explicitly shown at the top of source code(GPLv2+). Good.
It seems that it is written in Vala language. (A translator from vala to C is used.)
It seems that is has no test case.
It extends typeRef field.

Instead of merging the vala translator to universal ctags, I thinkit looks better to make the vala parser a standalone execuable. So universal ctags can utilize the standalone executable(valatags) as a xcmd.

Who maintains the vala parser? Anjuta project?

Oct 11 '15 09:10 masatake

Who maintains the vala parser?

It seems like they reuse the parser from libvala, which is the parser used by the Vala compiler. That's probably why the language support is written in Vala.

What is required to hook up a standalone executable to ctags? Is there some documentation for xcmd?

Aug 31 '17 20:08 craigbarnes

Xcmd is gone. See https://github.com/universal-ctags/ctags/issues/1442. So we have to write the parser for ctags from scratch.

Sep 01 '17 02:09 masatake

So we have to write the parser for ctags from scratch.

The libvala library that Anjuta uses is callable from C. Is it acceptable for ctags parsers to depend on external dynamic libraries? The Vala support could be made optional at configure-time, for anyone who wants to avoid the dependency.

Sep 03 '17 04:09 craigbarnes

Geany's c.c has decent Vala support, though I expect it'd be quite painful to bring it into here.

Sep 03 '17 11:09 codebrainz

@codebrainz, thank you. I looked at c.c. There are only a few conditions/branches about Vala. So I think importing it is not so difficult, I guess.

Kind definition for vala can be taken from ctags-vala.diff and c.c of geany. The last ciritcal issue is that there is no test case.

Sep 03 '17 17:09 masatake

There are only a few conditions/branches about Vala. So I think importing it is not so difficult, I guess.

I wouldn't know, last time I looked in c.c I woke up in cold-sweats after nightmares :)

Sep 03 '17 22:09 codebrainz

@codebrainz, you are correct. It is much more difficult than I expected.

Sep 07 '17 12:09 masatake

I found a completely different approach about merging vala parser in geany.

First, remove all code about c, c++, java, ..., the parsers other than vala from c.c of geany. Then name it vala.c. Finally put vala.c to u-ctags.

The much part of code of vala.c and c.c of u-ctags will be duplicated. However, I don't mind it because the code base is already difficult to maintain.

Oct 04 '17 16:10 masatake

I've got a new-from-scratch Vala parser I started a while ago for Geany and I never properly finished (mostly because I was playing with various ideas making a parser, none too clever in the end though), but I should be able to resurrect it for u-ctags.

Oct 04 '17 19:10 b4n

playing with various ideas making a parser

This is more interesting. This is the area we should explore more.

Oct 04 '17 20:10 masatake

The main idea I was playing with was an automatic token generator using regexes. However, it's kind of stupid in the sense that it still requires a hand-written parser to handle the tokens, and is (a lot) slower¹ (and less common) than e.g. flex. I didn't experiment (yet) writing a CTags parser with the help of flex (and possibly yacc/bison), but I guess it would be totally fine.

¹ I didn't perform any measurements, but I honestly don't think there is any need to know flex would be a lot faster than any dynamic matching of generic regular expressions :)

Oct 04 '17 20:10 b4n

YES.

See lflex branch of https://github.com/masatake/ctags.git .

I was thinking about a parser for flex using flex.

What I did the first is making a glue code and finding the way to integrate flex to our build-system.

The syntax of flex (as a target language) is too complicated for initial testing though I want to have flex parser in u-ctags.

I would like to test writing a scanner with flex for simpler language. CMakefile.txt is one of the candidate. Could you recommend something interesting target language which is enough simple?

My interest for performance is less than you. However, I expect flex will make writing a parser simpler.

Oct 04 '17 20:10 masatake

Currently --_mtable-regex parser is slow. However, I think I can translate to options to flex input. flex may generate an optimized scanner.

Oct 04 '17 20:10 masatake

I have quite a bit of experience with Flex and Bison, and integrating into Autotools. Let me know if you need any help setting it up, it can be a bit of a pain (especially with Bison) to get it working properly and re-entrantly.

@masatake a Vala lexer in Flex would be pretty easy (just tokenizing). The only tricky bit is that Vala has context-sensitive keywords (ex. get, set, etc) which would need info from a parser to handle exactly (if that's even needed for ctags).

Vala actually used to use Lex in the compiler, see here.

Oct 04 '17 23:10 codebrainz

@codebrainz, thank you. I will use this as an example in lflex branch.

Oct 05 '17 14:10 masatake

Hi, I'm GNOME foundation member, collaborating on GNOME Builder and gitg maintainer (written in vala). I care about vala integration, and ctags support for vala would be a great addition.

I just add anjuta-tags to my PATH and it works pretty well. But I'm not sure how much it differs from ctags, and I want to collaborate merging it to ctags.

Is there any work in progress (even if old) that I can rebase or try to fix.

Any info is appreciatted. Even if the plans are to rework the implementation just let me know and I will try to collaborate on that

Nov 12 '19 08:11 albfan

@albfan, the important topics were already discussed here.

The situation of ctags side has been changed a bit.

xcmd feature (invoking an external command from ctags) is removed.
packcc, a parser generator is introduced to ctags. (However, only varlink parser uses it)

@b4n, @codebrainz , do you have a comment for the comment from @albfan?

@albfan, do you insist that u-ctags should use (or import) the code developed at anjuta-tags, or not? If you accept writing vala parser from scratch, I can show what kind of options u-ctags can offer.

Nov 12 '19 21:11 masatake

I tried to port the change and fileLocation is now a MIOPos so we can discart it.

I will use anjuta-tags as a guide if I found some blocker (as it is working for vala) Just that.

I can reimplement it in pure C, but some guidance is needed @masatake.

Nov 12 '19 22:11 albfan

I can reimplement it in pure C, but some guidance is needed @masatake.

I see.

I would like you to now the concepts in Ctags: kind, field, and extra. See https://docs.ctags.io/en/latest/man/ctags.1.html#tag-entries.

parsers/tcl.c may be helpful. Tcl parser uses main/tokeninfo.[ch]. I wrote tokeninfo.[ch] after studying how the other parsers are written. I extracted essence of parsers. tcl.c calls some sub parser related functions. I introduced them for supporting itcl and tcloo.

U-ctags has corkQueue. With corkQueue, you can update a tagEntry after writing it to tags file with makeTagEntry (or makeSimpleTag)! This feature help you to record scope information. See https://docs.ctags.io/en/latest/internal.html#output-tag-stream .

See the item 1. and item 4 in https://docs.ctags.io/en/latest/man/ctags-optlib.7.html#overview-for-defining-a-parser The designing kind is the most important step in writing a parser. See also https://docs.ctags.io/en/latest/contributions.html.

U-ctags has a good test facility. See https://docs.ctags.io/en/latest/units.html.

Feel free to ask a question, here. More than 4 years, I have been looking for people trying to write a parser like you. Good luck!

Nov 13 '19 00:11 masatake

Thanks @masatake I will read through it. I just open a PR #2320 (doesn't even compile right now) as a WIP.

Nov 13 '19 08:11 albfan

I have the plumbings on the PR, so I'm now heading for vala hello world:

void main(string[] args) {
    print("Hello, World\n");
}

Java and c should help here to recreate the parser.

All will by done with Units of course.

docs directory is really good, html documentation and man pages too.

I'm gonna enjoy this.

Nov 13 '19 19:11 albfan

I just added comment parsing from cpreprocessor, and it works as expected, but for functions, namespaces, I'm not sure how to proceed.

I'm revisiting anjuta-tags and it is using Vala because it is using the Parser that comes with vala. Which makes everything super easy, same as yaml uses libyaml.

Add dependency from vala is a bad idea. It can come disabled by default.

I will give a few more tries to tcl-like parser, but is not obvious to me How to detect main is a function.

/*
 * Vala Hello World
 */
void main(string[] args) {
    print("Hello, World\n");
}

Nov 14 '19 18:11 albfan

Where can I get the language reference manual or something like that? We need the URL for it first of all.

Next, could you tell me a large application or library written in Vala? I would like to add it to https://github.com/universal-ctags/codebase in the future.

I will show a micro prototype.

Nov 15 '19 00:11 masatake

Where can I get the language reference manual or something like that?

Not everything is 100% documented, but the language reference/manual is here: https://wiki.gnome.org/Projects/Vala/Manual

Next, could you tell me a large application or library written in Vala?

The compiler itself is probably a good relatively large example application written in Vala, and it contains some libraries like Gee as well: https://gitlab.gnome.org/GNOME/vala

but is not obvious to me How to detect main is a function

It might be a little tricky with a hand-rolled parser since the Vala grammer is not context-free, and unlike C/C++, it doesn't require forward declarations. If I remember correctly, Vala's parser does it by doing a first pass and keeping multiple ways to parse in the AST and then once it knows what all the types are, it does another pass over the AST to resolve the ambiguities.

If the dependency on Vala (and thus GLib/GObject/friends) is acceptable, using Vala's own parser indeed seems like a much easier approach.

Nov 15 '19 00:11 codebrainz

Hi!

If the dependency on Vala (and thus GLib/GObject/friends) is acceptable, using Vala's own parser indeed seems like a much easier approach.

I would like to know the detail of FRIENDS.

Nov 15 '19 00:11 masatake

I just meant whatever GLib/GObject depends on, so I guess like on my version of Ubuntu:

$ apt-cache depends libglib2.0-0
libglib2.0-0
  Depends: libc6
  Depends: libffi6
  Depends: libmount1
  Depends: libpcre3
  Depends: libselinux1
  Depends: zlib1g
  Recommends: libglib2.0-data
  Recommends: shared-mime-info
    shared-mime-info:i386
  Recommends: xdg-user-dirs
    xdg-user-dirs:i386

Though if I recall some of those are also embedded in GLib source tree so may not actually require separate packages like on Debian/Ubuntu.

Nov 15 '19 01:11 codebrainz

Thank you. Can I ask one more?

It might be a little tricky with a hand-rolled parser since the Vala grammer is not context-free, and unlike C/C++, it doesn't require forward declarations. If I remember correctly, Vala's parser does it by doing a first pass and keeping multiple ways to parse in the AST and then once it knows what all the types are, it does another pass over the AST to resolve the ambiguities.

I would like to see typical tricky example code for which writing a parser hard. (Smaller one is enough, of course.) As you know well, our C parser doesn't process include-files. So the C parser handles input without information available via forward declarations.

Nov 15 '19 02:11 masatake

I would like to see typical tricky example code for which writing a parser hard.

I don't know enough about CTags parsers to know what would be tricky, I'm mostly speaking about a traditional parser that needs to correctly parse the input 100%. Since Geany's c.c parses most Vala declarations with some success, it's surely doable.

IIRC, the main thing missing from Geany's Vala parser is descending into function bodies, where it's a little trickier to tell things apart, ex. A * B; could be a variable declaration or an multiplication expression statement depending what A and B are.

Also, depending whether the lexer/tokenizer has only one look-ahead token or if it has more (ex. unbounded) look-ahead tokens, and how hard it is to backtrack in CTags parsers, something like @albfan's example might be a bit tricky, like if it consumed an A and the next token is a B, it could either be a variable declaration or a function declaration. It has to look-ahead one more token to see if it's a ( in which case it's a function declaration or a = or ; for a variable declaration and backtrack as needed.

A B    // ?
A B (  // function
A B ;  // variable
A B =  // initialized variable

It must already be supported though, since as you said C and C++ and similar languages can be mostly handled OK.

As you know well, our C parser doesn't process include-files. So the C parser handles input without information available via forward declarations.

Yeah good point, I can't think of anything in Vala that's harder to parse correctly than C or C++, if it doesn't need to be completely correct parsing.

Nov 15 '19 03:11 codebrainz

I reviewed c.c (where C and Java parsers are implemented) as they both are pretty similar to Vala. Once I see things like prevToken it sounds to me like reimplement ANTLR or flex (I might be wrong of course).

So I open another PR #2325. Where some problems with autotools and code are still left to do.

Nov 15 '19 08:11 albfan