docvim icon indicating copy to clipboard operation
docvim copied to clipboard

Add support for extracting documentation from Lua files

Open wincent opened this issue 3 years ago • 3 comments

Seeing as I am porting a bit of stuff from Vimscript to Lua lately... I don't really want to have to maintain .vim files just to hold documentation comments, although I will if I have to.

Related: #43

wincent avatar Oct 13 '20 22:10 wincent

Just leaving this here in case it proves useful:

  • Lua parser: https://github.com/rozbb/nom-lua53
  • Written using nom: https://github.com/Geal/nom (a Rust-based parser-combinator library)

Also:

  • https://blog.cloudflare.com/building-fast-interpreters-in-rust/
  • https://thume.ca/2019/04/18/writing-a-compiler-in-rust/

wincent avatar Jul 09 '21 23:07 wincent

Also, from "The Complete Syntax of Lua":

chunk ::= {stat [`;´]} [laststat [`;´]]

block ::= chunk

stat ::=  varlist `=´ explist | 
         functioncall | 
         do block end | 
         while exp do block end | 
         repeat block until exp | 
         if exp then block {elseif exp then block} [else block] end | 
         for Name `=´ exp `,´ exp [`,´ exp] do block end | 
         for namelist in explist do block end | 
         function funcname funcbody | 
         local function Name funcbody | 
         local namelist [`=´ explist] 

laststat ::= return [explist] | break

funcname ::= Name {`.´ Name} [`:´ Name]

varlist ::= var {`,´ var}

var ::=  Name | prefixexp `[´ exp `]´ | prefixexp `.´ Name 

namelist ::= Name {`,´ Name}

explist ::= {exp `,´} exp

exp ::=  nil | false | true | Number | String | `...´ | function | 
         prefixexp | tableconstructor | exp binop exp | unop exp 

prefixexp ::= var | functioncall | `(´ exp `)´

functioncall ::=  prefixexp args | prefixexp `:´ Name args 

args ::=  `(´ [explist] `)´ | tableconstructor | String 

function ::= function funcbody

funcbody ::= `(´ [parlist] `)´ block end

parlist ::= namelist [`,´ `...´] | `...´

tableconstructor ::= `{´ [fieldlist] `}´

fieldlist ::= field {fieldsep field} [fieldsep]

field ::= `[´ exp `]´ `=´ exp | Name `=´ exp | exp

fieldsep ::= `,´ | `;´

binop ::= `+´ | `-´ | `*´ | `/´ | `^´ | `%´ | `..´ | 
         `<´ | `<=´ | `>´ | `>=´ | `==´ | `~=´ | 
         and | or

unop ::= `-´ | not | `#´

plus:

Names (also called identifiers) in Lua can be any string of letters,
digits, and underscores, not beginning with a digit. This coincides with
the definition of names in most languages. (The definition of letter
depends on the current locale: any character considered alphabetic by
the current locale can be used in an identifier.) Identifiers are used
to name variables and table fields.

The following keywords are reserved and cannot be used as names:

     and       break     do        else      elseif
     end       false     for       function  if
     in        local     nil       not       or
     repeat    return    then      true      until     while

Lua is a case-sensitive language: and is a reserved word, but And and
AND are two different, valid names. As a convention, names starting
with an underscore followed by uppercase letters (such as _VERSION) are
reserved for internal global variables used by Lua.

The following strings denote other tokens:

     +     -     *     /     %     ^     #
     ==    ~=    <=    >=    <     >     =
     (     )     {     }     [     ]
     ;     :     ,     .     ..    ...

Literal strings can be delimited by matching single or double quotes,
and can contain the following C-like escape sequences: '\a' (bell),
'\b' (backspace), '\f' (form feed), '\n' (newline), '\r' (carriage
return), '\t' (horizontal tab), '\v' (vertical tab), '\\' (backslash),
'\"' (quotation mark [double quote]), and '\'' (apostrophe [single
quote]). Moreover, a backslash followed by a real newline results in a
newline in the string. A character in a string can also be specified
by its numerical value using the escape sequence \ddd, where ddd is a
sequence of up to three decimal digits. (Note that if a numerical escape
is to be followed by a digit, it must be expressed using exactly three
digits.) Strings in Lua can contain any 8-bit value, including embedded
zeros, which can be specified as '\0'.

Literal strings can also be defined using a long format enclosed by long
brackets. We define an opening long bracket of level n as an opening
square bracket followed by n equal signs followed by another opening
square bracket. So, an opening long bracket of level 0 is written as
[[, an opening long bracket of level 1 is written as [=[, and so on. A
closing long bracket is defined similarly; for instance, a closing long
bracket of level 4 is written as ]====]. A long string starts with an
opening long bracket of any level and ends at the first closing long
bracket of the same level. Literals in this bracketed form can run for
several lines, do not interpret any escape sequences, and ignore long
brackets of any other level. They can contain anything except a closing
bracket of the proper level.

For convenience, when the opening long bracket is immediately followed
by a newline, the newline is not included in the string. As an example,
in a system using ASCII (in which 'a' is coded as 97, newline is coded
as 10, and '1' is coded as 49), the five literal strings below denote
the same string:

     a = 'alo\n123"'
     a = "alo\n123\""
     a = '\97lo\10\04923"'
     a = [[alo
     123"]]
     a = [==[
     alo
     123"]==]

A numerical constant can be written with an optional decimal part and
an optional decimal exponent. Lua also accepts integer hexadecimal
constants, by prefixing them with 0x. Examples of valid numerical
constants are

     3   3.0   3.1416   314.16e-2   0.31416E1   0xff   0x56

A comment starts with a double hyphen (--) anywhere outside a
string. If the text immediately after -- is not an opening long
bracket, the comment is a short comment, which runs until the end
of the line. Otherwise, it is a long comment, which runs until the
corresponding closing long bracket. Long comments are frequently used to
disable code temporarily.

wincent avatar Jul 20 '21 18:07 wincent

For comparison, here is a slightly more comprehensive example (from the 5.3 manual; the one above is from the 5.1 manual) — but note, Neovim uses LuaJIT, which means 5.1.

chunk ::= block

block ::= {stat} [retstat]

stat ::=  ‘;’ | 
         varlist ‘=’ explist | 
         functioncall | 
         label | 
         break | 
         goto Name | 
         do block end | 
         while exp do block end | 
         repeat block until exp | 
         if exp then block {elseif exp then block} [else block] end | 
         for Name ‘=’ exp ‘,’ exp [‘,’ exp] do block end | 
         for namelist in explist do block end | 
         function funcname funcbody | 
         local function Name funcbody | 
         local namelist [‘=’ explist] 

retstat ::= return [explist] [‘;’]

label ::= ‘::’ Name ‘::’

funcname ::= Name {‘.’ Name} [‘:’ Name]

varlist ::= var {‘,’ var}

var ::=  Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name 

namelist ::= Name {‘,’ Name}

explist ::= exp {‘,’ exp}

exp ::=  nil | false | true | Numeral | LiteralString | ‘...’ | functiondef | 
         prefixexp | tableconstructor | exp binop exp | unop exp 

prefixexp ::= var | functioncall | ‘(’ exp ‘)’

functioncall ::=  prefixexp args | prefixexp ‘:’ Name args 

args ::=  ‘(’ [explist] ‘)’ | tableconstructor | LiteralString 

functiondef ::= function funcbody

funcbody ::= ‘(’ [parlist] ‘)’ block end

parlist ::= namelist [‘,’ ‘...’] | ‘...’

tableconstructor ::= ‘{’ [fieldlist] ‘}’

fieldlist ::= field {fieldsep field} [fieldsep]

field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp

fieldsep ::= ‘,’ | ‘;’

binop ::=  ‘+’ | ‘-’ | ‘*’ | ‘/’ | ‘//’ | ‘^’ | ‘%’ | 
         ‘&’ | ‘~’ | ‘|’ | ‘>>’ | ‘<<’ | ‘..’ | 
         ‘<’ | ‘<=’ | ‘>’ | ‘>=’ | ‘==’ | ‘~=’ | 
         and | or

unop ::= ‘-’ | not | ‘#’ | ‘~’

And:

It ignores spaces (including new lines) and comments between lexical
elements (tokens), except as delimiters between names and keywords.

Names (also called identifiers) in Lua can be any string of letters,
digits, and underscores, not beginning with a digit and not being a
reserved word. Identifiers are used to name variables, table fields, and
labels.

The following keywords are reserved and cannot be used as names:

     and       break     do        else      elseif    end
     false     for       function  goto      if        in
     local     nil       not       or        repeat    return
     then      true      until     while

The following strings denote other tokens:

     +     -     *     /     %     ^     #
     &     ~     |     <<    >>    //
     ==    ~=    <=    >=    <     >     =
     (     )     {     }     [     ]     ::
     ;     :     ,     .     ..    ...

A short literal string can be delimited by matching single or double
quotes, and can contain the following C-like escape sequences: '\a'
(bell), '\b' (backspace), '\f' (form feed), '\n' (newline), '\r'
(carriage return), '\t' (horizontal tab), '\v' (vertical tab), '\\'
(backslash), '\"' (quotation mark [double quote]), and '\'' (apostrophe
[single quote]). A backslash followed by a line break results in a
newline in the string. The escape sequence '\z' skips the following span
of white-space characters, including line breaks; it is particularly
useful to break and indent a long literal string into multiple lines
without adding the newlines and spaces into the string contents. A short
literal string cannot contain unescaped line breaks nor escapes not
forming a valid escape sequence.

We can specify any byte in a short literal string by its numeric value
(including embedded zeros). This can be done with the escape sequence
\xXX, where XX is a sequence of exactly two hexadecimal digits, or with
the escape sequence \ddd, where ddd is a sequence of up to three decimal
digits. (Note that if a decimal escape sequence is to be followed by a
digit, it must be expressed using exactly three digits.)

The UTF-8 encoding of a Unicode character can be inserted in a literal
string with the escape sequence \u{XXX} (note the mandatory enclosing
brackets), where XXX is a sequence of one or more hexadecimal digits
representing the character code point.

Literal strings can also be defined using a long format enclosed by long
brackets. We define an opening long bracket of level n as an opening
square bracket followed by n equal signs followed by another opening
square bracket. So, an opening long bracket of level 0 is written as
[[, an opening long bracket of level 1 is written as [=[, and so on. A
closing long bracket is defined similarly; for instance, a closing long
bracket of level 4 is written as ]====]. A long literal starts with an
opening long bracket of any level and ends at the first closing long
bracket of the same level. It can contain any text except a closing
bracket of the same level. Literals in this bracketed form can run
for several lines, do not interpret any escape sequences, and ignore
long brackets of any other level. Any kind of end-of-line sequence
(carriage return, newline, carriage return followed by newline, or
newline followed by carriage return) is converted to a simple newline.

For convenience, when the opening long bracket is immediately followed
by a newline, the newline is not included in the string. As an example,
in a system using ASCII (in which 'a' is coded as 97, newline is coded
as 10, and '1' is coded as 49), the five literal strings below denote
the same string:

     a = 'alo\n123"'
     a = "alo\n123\""
     a = '\97lo\10\04923"'
     a = [[alo
     123"]]
     a = [==[
     alo
     123"]==]

Any byte in a literal string not explicitly affected by the previous
rules represents itself. However, Lua opens files for parsing in text
mode, and the system file functions may have problems with some control
characters. So, it is safer to represent non-text data as a quoted
literal with explicit escape sequences for the non-text characters.

A numeric constant (or numeral) can be written with an optional
fractional part and an optional decimal exponent, marked by a letter 'e'
or 'E'. Lua also accepts hexadecimal constants, which start with 0x or
0X. Hexadecimal constants also accept an optional fractional part plus
an optional binary exponent, marked by a letter 'p' or 'P'. A numeric
constant with a radix point or an exponent denotes a float; otherwise,
if its value fits in an integer, it denotes an integer. Examples of
valid integer constants are

     3   345   0xff   0xBEBADA

Examples of valid float constants are

     3.0     3.1416     314.16e-2     0.31416E1     34e1
     0x0.1E  0xA23p-4   0X1.921FB54442D18P+1

A comment starts with a double hyphen (--) anywhere outside a
string. If the text immediately after -- is not an opening long
bracket, the comment is a short comment, which runs until the end
of the line. Otherwise, it is a long comment, which runs until the
corresponding closing long bracket. Long comments are frequently used to
disable code temporarily.

wincent avatar Jul 20 '21 18:07 wincent