ctags icon indicating copy to clipboard operation
ctags copied to clipboard

A new language definition for Zshshell script

Open psprint opened this issue 3 years ago • 4 comments

I've defined a new language that much better parses shell script:

  • it allows arbitrary [[:alphnum].+:_-] characters in the name of the function (runnables in shell are allowed the same as the filenames),
  • it recognizes multiple variables in single line, e.g.: local a=1 b=3.

I'm using it succesfully in zinit zsh plugin manager: https://asciinema.org/a/aEqd70X5oR6ruvr1OwrdJ4goG

The def file is at: https://github.com/zdharma-continuum/zinit/blob/main/share/zsh.ctags

I've somewhere saw some info about optlib, that users are encouraged to submit custom language definition, so I'm doing so.

psprint avatar Sep 29 '22 13:09 psprint

I'm having doubts that the shebang only files (without extension like .sh) would not be processed. Is that true ? If so, how to workaround this?

psprint avatar Sep 29 '22 13:09 psprint

Welcome!

About the video about the zsh plugin, I could not understand it well. My knowledge about the area may be too limited or the clock speed of my brain is too slow.

I've somewhere saw some info about optlib, that users are encouraged to submit custom language definition, so I'm doing so.

Yes.

I would like to integrate your effort into our source tree in directly or indirectly way.

I've defined a new language that much better parses shell script:

As you may know, ctags has a parser for shell scripts. The existing one also has some interesting features:

% cat foo
#!/bin/sh

. /etc/site-common

cat > foo.c <<EOF
int main(void) {
 printf("%s, hello world\n", "$USER");
 return 0;
}
EOF

cat > foo.py <<EOF
def hello(user):
    print user + ' hello world'
hello('$USER')
EOF
% ~/bin//ctags --options=NONE -G -o - --extras=+r --fields=+Krl /tmp/foo
ctags: Notice: No options will be read from files or environment
/etc/site-common	/tmp/foo	/^. \/etc\/site-common$/;"	script	language:Sh	roles:loaded
EOF	/tmp/foo	/^EOF$/;"	heredoc	language:Sh	roles:endmarker
EOF	/tmp/foo	/^cat > foo.c <<EOF$/;"	heredoc	language:Sh	roles:def
EOF	/tmp/foo	/^cat > foo.py <<EOF$/;"	heredoc	language:Sh	roles:def
hello	/tmp/foo	/^def hello(user):$/;"	function	language:Python	roles:def
main	/tmp/foo	/^int main(void) {$/;"	function	language:C	typeref:typename:int	roles:def

So I would like to extend the shell script parser based on your zsh.ctags.

e.g. Your zsh.ctags recognizes local. Supporting local is also wanted in the exiting shell script parser.

Whether you agree with me about extending the existing one or not, could you consider putting copyright notice to your zsh.ctags like: https://github.com/universal-ctags/ctags/blob/master/optlib/mesonOptions.ctags ?

What we need next are test cases. If we integrate your code to ctags, of course, the test cases are useful for maintaining the parser. Even if we extend the existing shell script parser based on the zsh.ctags, the test cases are useful. Either way, the test cases may lead us to the wonderful future where ctags may support the zsh shell scripts.

I'm having doubts that the shebang only files (without extension like .sh) would not be processed. Is that true ? If so, how to workaround this?

Use -G option.

$ ~/bin/ctags --print-language /tmp/foo
~/bin/ctags --print-language /tmp/foo
/tmp/foo: NONE
$ ~/bin/ctags -G --print-language /tmp/foo
~/bin/ctags -G --print-language /tmp/foo
/tmp/foo: Sh

The existing shell script parser (Sh) handles both.zsh and zsh. You may need to disable them:

--map-Sh=-.zsh
--alias-Sh=-zsh 

You may want to add

--alias-zsh=+zsh

instead.

You use --langmap=zsh:.zsh. It can be --map-zsh=+.zsh.

masatake avatar Sep 29 '22 18:09 masatake

Before creating the parser I've extended the exuberant ctags to:

  • support mode lines of vim and emacs to detect shell script (e.g.: vim mode line could be: vim:ft=sh): https://github.com/psprint/zcommodore/commit/9451bea0d71f0777cef6a8b51ddd9713a9793627,
  • to detect local/typeset/declare/etc. and multiple variables after them: https://github.com/psprint/zcommodore/commit/97f3d73c3d2a8430f2a285a79209f632a9f71ab0, https://github.com/psprint/zcommodore/commit/813c14d7561278c39a4d40c283abf8966f92b61b,
  • fix two crashes, one for vim: https://github.com/psprint/zcommodore/commit/3850a6a316d31b1eed3033c141b8d38b6faf2606 and one for shell:https://github.com/psprint/zcommodore/commit/63ffd6990df932a36410959cea957d8df315a8a5.

Full changes are listed on: https://github.com/psprint/zcommodore/blob/master/myctags/exuberant-ctags-improvements-for-Sh-language.patch in the repo of an unfinished Zsh plugin – it was shipping a custom ctags source in the myctags subdirectory: https://github.com/psprint/zcommodore/

psprint avatar Sep 30 '22 08:09 psprint

About the video about the zsh plugin, I could not understand it well. My knowledge about the area may be too limited or the clock speed of my brain is too slow.

PS. The video has been presenting the new kinds that are found out by the zsh.ctags option file parser. So basically, I've used my daily TAGS viewers (fzf-ctags, for example), to present all the tags (functions with chars outside [a-z] and multiple local variable definition) that are detected by it.

psprint avatar Sep 30 '22 09:09 psprint

https://github.com/psprint/zcommodore/commit/9451bea0d71f0777cef6a8b51ddd9713a9793627 u-tags has the same feature: https://github.com/universal-ctags/ctags/commit/3c806c71c29bfa3a5ab43e4d5ddaeecbeb73885a .

https://github.com/psprint/zcommodore/commit/97f3d73c3d2a8430f2a285a79209f632a9f71ab0 this one is quite interesting. Capturing variables in Sh parser is one of popular requests. I have a question is about "local". How do you use the tags for local variables? The C parser also extracts local variables. It may be useful to build a tree structure of the source file in "sidebar" of editors. Building a tree structure can be done because the C parser also records the scope of the local variable, the function where the local variable is defined. In Sh parser, you don't record the scope for local variables. So I wonder how useful it it.

$ cat /tmp/foo.sh 
#!/bin/bash

f0()
{
    local x
    :
}

f1()
{
    local x
    :
}
$ ~/bin/ctags -o - --sort=no --fields=+K /tmp/foo.sh
f0	/tmp/foo.sh	/^f0()$/;"	function
x	/tmp/foo.sh	/^	local x$/;"	variable
f1	/tmp/foo.sh	/^f1()$/;"	function
x	/tmp/foo.sh	/^	local x$/;"	variable

This one is a manually written example. You will find two x. There is not enough information for rebuilding a tree structure like:

f0
 `- x
f1
 `- x

Of course, there are more applications than building a tree structure. However, as far as I know, building a tree structure is popular one. For building a tree structure, we need the following output:

$ ~/bin/ctags -o - --sort=no --fields=+K /tmp/foo.sh
f0	/tmp/foo.sh	/^f0()$/;"	function
x	/tmp/foo.sh	/^	local x$/;"	variable	scope:function:f0
f1	/tmp/foo.sh	/^f1()$/;"	function
x	/tmp/foo.sh	/^	local x$/;"	variable	scope:function:f1

I think we have more areas to improve. To fill the scope fields of variables, we have to make the Sh parser tracks the pair of { and }. It looks not easy to me.

About https://github.com/psprint/zcommodore/commit/3850a6a316d31b1eed3033c141b8d38b6faf2606, it seems that u-ctags doesn't have the same change. It will be nice to have a test input that needs the change.

About https://github.com/psprint/zcommodore/commit/63ffd6990df932a36410959cea957d8df315a8a5, it will be nice to have a test input that needs the change.

masatake avatar Oct 02 '22 12:10 masatake

About your improvement for the vim parser, I would like you to open a new issue (or pull request) if you are interested in merging your efforts to u-ctags. Of course, I have an interest. However, I don't know well abo; what, what I know about it is "hjkl", so a contributor must drive the tasks for merging.

About your zsh parser, I think the core part of your work, extracting variables, is not zsh specific. The improvement is meaningfull for parsing bash shell scripts. So if possible, I would like to merge your efforts and knowledge to sh.c.

masatake avatar Oct 02 '22 12:10 masatake

Thanks for examining my changes. I'm surprised that there's vim/emacs modeline support (if run with -G option?). I've updated my some Makefile targets and other scripts to use it, however it seems that defining a language called "zsh" requires an emacs modeline with mode: zsh. I mean, it's already impressive that the two names ("zsh") are matched, that's first what I would like to stress out. However, the problems are:

  • Emacs doesn't have, AFAIK, a zsh mode, only a sh mode,
  • Vim modeline like # vim:ft=zsh:sw=4:sts=4 is on the other hand apparently ignored.

So to parse tags from a non ….zsh extension files containing the two modeline types basically requires mode: zsh in the emacs modeline, which is, as mentioned, problematic.

What's the status of vim modeline support? Are they examined at all? Also, could one alter the file type string that's searched in them? Like, e.g.: -G zsh?

Thank you for supporting my request. I would like to become u-ctags contributor. It would be best if some entry level difficulties would be eased to me, e.g.: by poniting me to the area where local/declare/typeset lines are parsed. It is true that my zsh regex based parser is bash compatible.

PS. One other thing for the zsh regex/native parser – support for autoload functions/lines. This is a zsh-only feature, in which:

  • their location is any dir in $FPATH – a (possibly exported) variable with the same format as $PATH – colon separated list of dirs/paths,
  • files in there are the functions bodies,
  • to call such function one issues autoload fn_name, which creates empty function stub and fills it with the content when called.

I wonder how to:

  • modify the zsh.ctags regex parser to provide tags for first line of the functions/files,
  • add such support also to sh.c?

Example:

% print "print Hi!" > ./hi
% FPATH+=$PWD
% autoload hi
% hi
Hi!

psprint avatar Oct 07 '22 13:10 psprint

UPDATE: I've managed to overcome the mode: zsh problem via --alias-zsh=+sh. However one other important limitation appeared: not accepting - dash/minus sign in function names. This should be fixed, because any char is allowed in bash and zsh functions, because function is a runnable like a command on disk, so it has to accept any char that can be a file name on disk, plus even more – slash is extra allowed too, in function/runnable name.

psprint avatar Oct 07 '22 14:10 psprint

This issue includes sub-issues about zsh. I cannot track, discuss, fix and improve simultaneously.

Could you make a meta issue for improving zsh support like issues in https://github.com/universal-ctags/ctags/issues?q=is%3Aopen+is%3Aissue+label%3A%22DASH+BOARD%22

We can open an issue one by one picked up from the list you will make.

When opening an issue for one of them, please show the command line output, input source code, and expected tags out. I'm not good at English. So I don't understand what you want easily.

In many cases, you can explain the issue you find as a test case.

https://docs.ctags.io/en/latest/testing-parser.html explains the way to test the behavior of a parser.

https://docs.ctags.io/en/latest/testing-ctags.html explains the way to test the ctags common part; we call the part "main". Guessing a proper parser for a given input is part of "main".

Thank you for supporting my request. I would like to become u-ctags contributor.

Thank you very much, and welcome.

Before starting, I would like you to read https://docs.ctags.io/en/latest/man-pages.html especially https://docs.ctags.io/en/latest/man/ctags.1.html#language-selection-and-mapping-options and https://docs.ctags.io/en/latest/man/ctags.1.html#determining-file-language.

masatake avatar Oct 08 '22 20:10 masatake

I prototyped a new Zsh parser supporting autoload #3499.

masatake avatar Oct 09 '22 09:10 masatake