Tokenize.jl
Tokenize.jl copied to clipboard
Use Documenter
I did not add a badge () to the README since there isn't much documentation yet. But with this scaffolding in place, it will be easier to add some documentation gradually.
Codecov Report
Merging #170 (a4e0bca) into master (c945d46) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #170 +/- ##
=======================================
Coverage 88.34% 88.34%
=======================================
Files 6 6
Lines 1047 1047
=======================================
Hits 925 925
Misses 122 122
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c945d46...a4e0bca. Read the comment docs.
Personally, I think that Documenter is a bit overkill for this package since the exposed API is so small (meaning the extra indirection over just looking at the README is significant). Is there any concrete reason why this would be an improvement?
Granted, what prompted this PR was not knowing what RawToken
was, and there's an issue already open to document it in the README. But I think the barrier for adding documentation is much lower if it's just a docstring.
Using Documenter means that you can meaningfully cross-reference identifiers.
To the point about the exposed API:
-
I think it's worth documenting internals that are not exposed, and leveraging cross-referencing where it serves a purpose. That would help me understand this package, and make it more useful to me.
-
I think it's not clear to me what the exposed API actually is. Here's one guess:
export tokenize, untokenize, Tokens
and
It takes a string or a buffer and creates an iterator that will sequentially return the next
Token
until the end of string or buffer. The argument totokenize
can either be aString
,IOBuffer
or anIOStream
.
startpos(t)::Tuple{Int, Int} # row and column where the token start
endpos(t)::Tuple{Int, Int} # row and column where the token ends
startbyte(T)::Int # byte offset where the token start
endbyte(t)::Int # byte offset where the token ends
untokenize(t)::String # string representation of the token
kind(t)::Token.Kind # kind of the token
exactkind(t)::Token.Kind # exact kind of the token
However, other packages are using more than just those exports and what is documented in the README.
https://github.com/julia-vscode/CSTParser.jl/blob/335219c92df1614ab9a701cc8f2caf93c2e2ab5b/src/CSTParser.jl#L6-L8
import Tokenize.Tokens: RawToken, AbstractToken, iskeyword, isliteral, isoperator, untokenize
import Tokenize.Lexers: Lexer, peekchar, iswhitespace
https://github.com/julia-vscode/LanguageServer.jl/blob/0b40df159af055a5faea3922692aae8dd707f233/src/utilities.jl#L167
t = CSTParser.Tokenize.Lexers.next_token(ts)
https://github.com/julia-vscode/DocumentFormat.jl/blob/772e0916a56af1193d5d03464ce11f89a8a58988/src/passes.jl#L207
while !Tokenize.Lexers.eof(ts)
A lot of these have docstrings already. Some of them might be missing a docstring, or have a self-documenting name. Some of them should be rewritten to use the exposed interface (for example, iterate
may be preferred over next_token
).
I think it's not clear to me what the exposed API actually is.
The one that is documented.
However, other packages are using more than just those exports and what is documented in the README.
They use internals of the package that have been found by looking at the source code. Standard caveats apply.
The one that is documented.
You mean e.g. the functions that have docstrings?
No, I mean the ones that are in the documentation (the README). There can be a discussion about exposing more inside Tokenize.jl as public but I just want to be clear that anything that is not listed in the documentation is considered internal and subject to change.
Okay, perfect.
There are already docstrings on not-exposed functions. Why are they there? And assuming that we're not going to remove them, why not generate the "documentation" (not the README) that displays them nicely?
It's been discussed elsewhere a few times that in the Julia ecosystem there's often a conflation among:
- What is documented
- What, if changed in a backwards-incompatible way, causes a semver major version bump
- What is exported by a module
Is that fair to say? And if so, this PR is just about 1, so it's not immediately relevant that this package has a small exposed interface (2 and 3).
EDIT: I re-ordered 1-3 to reflect the nesting relationship that I expect. 3 should be a subset of 2, which should be a subset of 1. To me, that means anything can be documented without consequence.
I don't know of a way in Julia itself ("in band") to distinguish between 1 and 2. I think it's fine if this package uses the README ("out of band") to delineate items in 2.