Request: Prolog Parser
Creating an issue for extending universal-ctags to handle Prolog [1].
... as per Masatake's request [2]
[1] https://en.wikipedia.org/wiki/Prolog [2] https://github.com/universal-ctags/ctags/issues/1566#issuecomment-678057425
SWI Prolog syntax grammar/BNF?
http://swi-prolog.996271.n3.nabble.com/SWI-Prolog-syntax-grammar-BNF-td11598.html
SWI Prolog provide a clone of Emacs that (I believe*) parses Prolog:
https://www.swi-prolog.org/pldoc/man?section=pceemacs
https://en.wikipedia.org/wiki/SWI-Prolog#PceEmacs
- I haven't used it; I'm familiar with Vim.
Thank you.
I will write an initial minimum version of prolog parser. I expect you complete it. I would like you to read #2622. The issue tells how important designing in a parser development is.
Taken from [1]:
/* input.pl */
mother_child(trude, sally).
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).
What kind of tags output do you expect? What should be tagged? What "kinds" should be assigned to the tags?
Hey Masatake,
Cheers for coming back so quickly. I'll read #2622.
Thanks also for your efforts writing the minimum parser. I'll do what I can to complete it, though will probably need assistance.
I'm not that familiar with ctags, so may not answer your question correctly.
Tags:
predicates: mother_child, father_child etc. atoms: tom, sally, variables: X, Y
Translation from Prolog speak:
"predicates" -> method (Prolog has no functions) "atom" -> static value (distinct from a string) "variable" -> :)
String -> string (text surrounded with ", for example "this is a SWI-Prolog string"
Text surrounded with ' are treated as an atom. abc and 'abc' are equivalent.
I note that Prolog defines modules [1], for example:
36 :- module(charsio, 37 [ format_to_chars/3, % +Format, +Args, -Codes 38 format_to_chars/4, % +Format, +Args, -Codes, ?Tail 39 write_to_chars/2, % +Term, -Codes 40 write_to_chars/3, % +Term, -Codes, ?Tail 41 atom_to_chars/2, % +Atom, -Codes 42 atom_to_chars/3, % +Atom, -Codes, ?Tail 43 number_to_chars/2, % +Number, -Codes
Is the definition for the module 'charsio' [2].
Predicates listed in the list (from line 37 onwards) are added to the global namespace. Predicates not listed are still accessible, by prepending the module name. For example charsio:some_other_predicate.
Fyi, the number following the predicate name (e.g. atom_to_chars/3) is the arity; it specifies the number of arguments....
[1] https://www.swi-prolog.org/pldoc/man?section=modules [2] https://www.swi-prolog.org/pldoc/doc/SWI/library/charsio.pl?show=src
If you have the time, here is a quick intro to Prolog:
https://www.youtube.com/watch?v=SykxWpFwMGs
It's 1 hour long, but you wouldn't need to watch anything like that much to see most of the syntax...
Thank you but prolog knowledge is enough. Expected tags output is really needed. It is the area I cannot help you.
Let's focus on smaller input:
mother_child(trude, sally).
What should be tagged? If we have perfect prolog parser in ctags, which tokens may ctags capture?
Here's the syntax doco fro Gnu Prolog:
http://gprolog.org/manual/html_node/gprolog019.html
mother_child(trude, sally).
If we have perfect prolog parser in ctags, which tokens may ctags capture?
I'm not that familiar with ctags yet. I can read more to be more helpful.
From what I know now, "mother_child" would be tagged as the equivalent of a C method, trude and sally as the equivalent of C strings.
... would it be helpful for me to write an "equivalent" C program and run it through u-ctags xref...?
My C is rusty, but it would be something like:
void mother_child( "trude", "sally" );
I know this isn't valid C; but trude and sally aren't variables here.
By comparison,
void mother_child( "trude", char *Name ) {
printf( "%s%n", Name );
}
would be:
mother_child( trude, Name ) :- writeln( Name ).
in Prolog.
O.k. I wrote minimum version of prolog parser.
$ cat input.pl
mother_child(trude, sally).
$ cat prolog.ctags
--langdef=Prolog
--map-Prolog=.pl
--kinddef-Prolog=p,predicate,predicates
--regex-Prolog=/^([a-zA-Z_]+)\([^.]+\)./\1/p/
$ ./ctags --options=./prolog.ctags --languages=Prolog -o - input.pl
mother_child input.pl /^mother_child(trude, sally).$/;" p language:Prolog
Is this the same as what you expect?
That was quick!
... I also think I see what you're doing. I can build on that...
Let's extend the input a bit.
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).
The first question is parent_child should be tagged twice or once?
I guess you may want to have a arity: field like:
parent_child input.pl /^...$/;" p arity:2
Am I correct? if yes, what I should do for the input:
mother_child( trude, Name ) :- writeln( Name ).
Is the arity for mother_child 1 or 2?
parent_child(X, Y) :- father_child(X, Y). parent_child(X, Y) :- mother_child(X, Y).The first question is
parent_childshould be tagged twice or once?
It shoud be tagged twice; they're multiple definitions of the predicate (method).
I guess you may want to have a arity: field like:
parent_child input.pl /^...$/;" p arity:2
Maybe, though I'm not sure how that would be used. Arity is just the count of predicate arguments...
mother_child( trude, Name ) :- writeln( Name ).
Is the arity for mother_child 1 or 2?
2 - the size of the argument list [ trude, Name ].
Everything after the :- is the "implementation" (in C terms... 😄 ).
$ cat input.pl
/* input.pl */
mother_child(trude, sally).
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).
% dummy0()
/* dummy1() */
$ cat optlib/prolog.ctags
--langdef=Prolog
--map-Prolog=.pl
###
# kind definitions
#
--kinddef-Prolog=p,predicate,predicates
--kinddef-Prolog=v,variable,variables
###
# table declarations
#
--_tabledef-Prolog=main
--_tabledef-Prolog=args
--_tabledef-Prolog=impl
--_tabledef-Prolog=comment
--_tabledef-Prolog=comment_multiline
--_tabledef-Prolog=comment_oneline
--_tabledef-Prolog=any
--_tabledef-Prolog=ignoreWhiteSpace
###
# utilities
#
--_mtable-regex-Prolog=any/.//
--_mtable-regex-Prolog=ignoreWhiteSpace/[ \t\n]+//
###
# comment
#
--_mtable-regex-Prolog=comment/\/\*//{tenter=comment_multiline}
--_mtable-regex-Prolog=comment/\%//{tenter=comment_oneline}
--_mtable-regex-Prolog=comment_multiline/\*\///{tleave}
--_mtable-extend-Prolog=comment_multiline+any
--_mtable-regex-Prolog=comment_oneline/\n//{tleave}
--_mtable-extend-Prolog=comment_oneline+any
###
# main
#
--_mtable-extend-Prolog=main+comment
--_mtable-extend-Prolog=main+ignoreWhiteSpace
--_mtable-regex-Prolog=main/([a-zA-Z_][a-zA-Z_0-9]*)/\1/p/{scope=push}
--_mtable-regex-Prolog=main/\(//{tenter=args}
--_mtable-regex-Prolog=main/\.//{scope=pop}
--_mtable-regex-Prolog=main/:-//{tenter=impl}
--_mtable-extend-Prolog=main+any
###
# args
#
--_mtable-extend-Prolog=args+comment
--_mtable-extend-Prolog=args+ignoreWhiteSpace
--_mtable-regex-Prolog=args/\)//{tleave}
--_mtable-regex-Prolog=args/[a-z,]+//
--_mtable-regex-Prolog=args/([A-Z][A-Za-z]*)/\1/v/{scope=ref}
--_mtable-extend-Prolog=args+any
###
# impl
#
--_mtable-extend-Prolog=impl+comment
--_mtable-extend-Prolog=impl+ignoreWhiteSpace
# If . is found, push back it. So the upper table (main) can handle it.
--_mtable-regex-Prolog=impl/\.//{tleave}{_advanceTo=0start}
--_mtable-extend-Prolog=impl+any
$ ./ctags --sort=no --options=./optlib/prolog.ctags --languages=Prolog -o - input.pl
mother_child input.pl /^mother_child(trude, sally).$/;" p language:Prolog
father_child input.pl /^father_child(tom, sally).$/;" p language:Prolog
father_child input.pl /^father_child(tom, erica).$/;" p language:Prolog
father_child input.pl /^father_child(mike, tom).$/;" p language:Prolog
sibling input.pl /^sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).$/;" p language:Prolog
X input.pl /^sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).$/;" v language:Prolog predicate:sibling
Y input.pl /^sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).$/;" v language:Prolog predicate:sibling
parent_child input.pl /^parent_child(X, Y) :- father_child(X, Y).$/;" p language:Prolog
X input.pl /^parent_child(X, Y) :- father_child(X, Y).$/;" v language:Prolog predicate:parent_child
Y input.pl /^parent_child(X, Y) :- father_child(X, Y).$/;" v language:Prolog predicate:parent_child
parent_child input.pl /^parent_child(X, Y) :- mother_child(X, Y).$/;" p language:Prolog
X input.pl /^parent_child(X, Y) :- mother_child(X, Y).$/;" v language:Prolog predicate:parent_child
Y input.pl /^parent_child(X, Y) :- mother_child(X, Y).$/;" v language:Prolog predicate:parent_child
I guess arity field and signature field should be filled.
Feel free to reopen this.
Hey, not sure it's still relevant but for years I've used ptags (a ctags-like compatible tool that works for prolog) and it created files that work like a charm with xemacs and nedit.
https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/prolog/util/ptags/
Thank you for the information. However, using the code in ctags is limited:
/*
* ptags - creates entries in a tags file for Prolog predicates
*
* Usage: ptags [-w] [-l] [-a] [-p] file1 ... filen
*
* This program code may be freely distributed provided
*
* a) it, or any part of it, is not sold for profit; and
------------------------------------^^^^^^^^^^^^^^^^^^^
I wrote the author a mail to see if he'd be willing to change the license.
Hi, Pr. Tweed answered me he's ok changing the licence. What licence would be acceptable for you guys?
Also can you give me your mail so I'll include you in my answer?
Thank you. My email address is [email protected].
A.
GNU General Public License version 2 or (at your option) any later version.
that is found in https://github.com/universal-ctags/ctags/blob/master/COPYING
, B. Unlicense that is found in https://unlicense.org/ , or C. The MIT license that is found in https://opensource.org/license/mit .
BTW, will you help me test the parser? We already have a prototype. https://github.com/universal-ctags/ctags/issues/2628#issuecomment-679816894
yeah, of course. I haven't touched prolog code in a while, but sure it could be fun :)
@dseddah, could you try #4249?
I have not used the ptags code directly.
$ cat input.pl
:- module(mymod, [a/0, a/1, a/2]).
a.
a(X) :-
X is 1.
a(X, Y) :- X is Y.
a('abc', 'efg') :- false.
/* /*
*/
dont_extract.
*/
prolog:message("msg").
b(X) --> c(X).
$ ./ctags --sort=no --fields=+Kne -o - ./input.pl
mymod ./input.pl /^:- module(mymod, [a\/0, a\/1, a\/2]).$/;" module line:1
a ./input.pl /^a.$/;" predicate line:2 module:mymod end:2 arity:0
a/0 ./input.pl /^a.$/;" predicate line:2 module:mymod end:2 arity:0
a ./input.pl /^a(X) :-$/;" predicate line:3 module:mymod end:4 arity:1
a/1 ./input.pl /^a(X) :-$/;" predicate line:3 module:mymod end:4 arity:1
a ./input.pl /^a(X, Y) :- X is Y.$/;" predicate line:5 module:mymod end:5 arity:2
a/2 ./input.pl /^a(X, Y) :- X is Y.$/;" predicate line:5 module:mymod end:5 arity:2
a ./input.pl /^a('abc', 'efg') :- false.$/;" predicate line:6 module:mymod end:6 arity:2
a/2 ./input.pl /^a('abc', 'efg') :- false.$/;" predicate line:6 module:mymod end:6 arity:2
message ./input.pl /^prolog:message("msg").$/;" predicate line:13 module:prolog end:13 arity:1
message/1 ./input.pl /^prolog:message("msg").$/;" predicate line:13 module:prolog end:13 arity:1
b ./input.pl /^b(X) --> c(X).$/;" grammar line:15 module:mymod end:15 arity:1
b/1 ./input.pl /^b(X) --> c(X).$/;" grammar line:15 module:mymod end:15 arity:1
The question is whether ctags should extract the same predicate having the same arity more than once.
ptags made only one tag for them. My parser made each of them.
... I also think I see what you're doing. I can build on that...
@TrentSe Any progress?