Tokenize.jl icon indicating copy to clipboard operation
Tokenize.jl copied to clipboard

Use Documenter

Open goretkin opened this issue 3 years ago • 7 comments

I did not add a badge (Dev) to the README since there isn't much documentation yet. But with this scaffolding in place, it will be easier to add some documentation gradually.

goretkin avatar Feb 28 '21 22:02 goretkin

Codecov Report

Merging #170 (a4e0bca) into master (c945d46) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #170   +/-   ##
=======================================
  Coverage   88.34%   88.34%           
=======================================
  Files           6        6           
  Lines        1047     1047           
=======================================
  Hits          925      925           
  Misses        122      122           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c945d46...a4e0bca. Read the comment docs.

codecov-io avatar Feb 28 '21 22:02 codecov-io

Personally, I think that Documenter is a bit overkill for this package since the exposed API is so small (meaning the extra indirection over just looking at the README is significant). Is there any concrete reason why this would be an improvement?

KristofferC avatar Mar 01 '21 08:03 KristofferC

Granted, what prompted this PR was not knowing what RawToken was, and there's an issue already open to document it in the README. But I think the barrier for adding documentation is much lower if it's just a docstring.

Using Documenter means that you can meaningfully cross-reference identifiers.

To the point about the exposed API:

  1. I think it's worth documenting internals that are not exposed, and leveraging cross-referencing where it serves a purpose. That would help me understand this package, and make it more useful to me.

  2. I think it's not clear to me what the exposed API actually is. Here's one guess:

export tokenize, untokenize, Tokens

and

It takes a string or a buffer and creates an iterator that will sequentially return the next Token until the end of string or buffer. The argument to tokenize can either be a String, IOBuffer or an IOStream.

startpos(t)::Tuple{Int, Int} # row and column where the token start
endpos(t)::Tuple{Int, Int}   # row and column where the token ends
startbyte(T)::Int            # byte offset where the token start
endbyte(t)::Int              # byte offset where the token ends
untokenize(t)::String        # string representation of the token
kind(t)::Token.Kind          # kind of the token
exactkind(t)::Token.Kind     # exact kind of the token

However, other packages are using more than just those exports and what is documented in the README.

https://github.com/julia-vscode/CSTParser.jl/blob/335219c92df1614ab9a701cc8f2caf93c2e2ab5b/src/CSTParser.jl#L6-L8

import Tokenize.Tokens: RawToken, AbstractToken, iskeyword, isliteral, isoperator, untokenize
import Tokenize.Lexers: Lexer, peekchar, iswhitespace

https://github.com/julia-vscode/LanguageServer.jl/blob/0b40df159af055a5faea3922692aae8dd707f233/src/utilities.jl#L167

t = CSTParser.Tokenize.Lexers.next_token(ts)

https://github.com/julia-vscode/DocumentFormat.jl/blob/772e0916a56af1193d5d03464ce11f89a8a58988/src/passes.jl#L207

while !Tokenize.Lexers.eof(ts)

A lot of these have docstrings already. Some of them might be missing a docstring, or have a self-documenting name. Some of them should be rewritten to use the exposed interface (for example, iterate may be preferred over next_token).

goretkin avatar Mar 01 '21 19:03 goretkin

I think it's not clear to me what the exposed API actually is.

The one that is documented.

However, other packages are using more than just those exports and what is documented in the README.

They use internals of the package that have been found by looking at the source code. Standard caveats apply.

KristofferC avatar Mar 02 '21 07:03 KristofferC

The one that is documented.

You mean e.g. the functions that have docstrings?

goretkin avatar Mar 02 '21 08:03 goretkin

No, I mean the ones that are in the documentation (the README). There can be a discussion about exposing more inside Tokenize.jl as public but I just want to be clear that anything that is not listed in the documentation is considered internal and subject to change.

KristofferC avatar Mar 02 '21 15:03 KristofferC

Okay, perfect.

There are already docstrings on not-exposed functions. Why are they there? And assuming that we're not going to remove them, why not generate the "documentation" (not the README) that displays them nicely?

It's been discussed elsewhere a few times that in the Julia ecosystem there's often a conflation among:

  1. What is documented
  2. What, if changed in a backwards-incompatible way, causes a semver major version bump
  3. What is exported by a module

Is that fair to say? And if so, this PR is just about 1, so it's not immediately relevant that this package has a small exposed interface (2 and 3).

EDIT: I re-ordered 1-3 to reflect the nesting relationship that I expect. 3 should be a subset of 2, which should be a subset of 1. To me, that means anything can be documented without consequence.

I don't know of a way in Julia itself ("in band") to distinguish between 1 and 2. I think it's fine if this package uses the README ("out of band") to delineate items in 2.

goretkin avatar Mar 02 '21 16:03 goretkin