PackageAnalyzer.jl
PackageAnalyzer.jl copied to clipboard
Count code from other subdirectories called in runtests.jl
Hi, great package idea !
In several packages, e.g. VoronoiFVM.jl, I run code from an examples
subdirectory and also a couple of Pluto notebooks during CI. Currently these are not counted as lines of test code.
- Is there any idea to handle this situation besides moving all examples to test ?
- If I have Pluto notebooks containing manifests for tests - how can I prevented from "cheating" due to counting the manifests as code lines ?
Is there any idea to handle this situation besides moving all examples to test ?
Hm, so PackageAnalyzer only categorizes code as being in test or not in the show
method; the actual Package
object itself just stores a table with lines of code per file.
So one option is to just ignore what is displayed in the show
method, and count the lines yourself:
julia> pkg = analyze("VoronoiFVM")
Package VoronoiFVM:
* repo: https://github.com/j-fu/VoronoiFVM.jl.git
* uuid: 82b139dc-5afc-11e9-35da-9b9bdfd336f3
* version: 0.18.3
* is reachable: true
* tree hash: a5f4bc559684925f45104513f9abd65570be86ff
* Julia code in `src`: 4926 lines
* Julia code in `test`: 585 lines (10.6% of `test` + `src`)
* documentation in `docs`: 1172 lines (19.2% of `docs` + `src`)
* documentation in README: 10 lines
* has license(s) in file: MIT
* filename: LICENSE
* OSI approved: true
* has `docs/make.jl`: true
* has `test/runtests.jl`: true
* has continuous integration: true
* GitHub Actions
julia> PackageAnalyzer.count_julia_loc(pkg, "test") + PackageAnalyzer.count_julia_loc(pkg, "examples")
3115
However, if the goal is to communicate to other folks how much test code there is (who may not know that the code is in examples
), I'm not sure what the best way to do that is. If we integrated JuliaSyntax (xref https://github.com/JuliaEcosystem/PackageAnalyzer.jl/issues/63), we could try to look at any include
statements from runtests.jl
and follow them. That might be the most satisfying way, and would also make sure that extraneous files in test
that aren't actually run don't count.
If I have Pluto notebooks containing manifests for tests - how can I prevented from "cheating" due to counting the manifests as code lines ?
Ideally, that would be a file with language
Julia and sublanguage
TOML. However, if I look at all the lines of code parsed from VoronoiVFM,
julia> DataFrame(pkg.lines_of_code)
13×7 DataFrame
Row │ directory language sublanguage files code comments blanks
│ String Symbol Union… Int64 Int64 Int64 Int64
─────┼───────────────────────────────────────────────────────────────────────
1 │ pluto-examples Julia 9 9435 784 473
2 │ pluto-examples TOML 2 1089 1 241
3 │ src Julia 19 4926 300 1033
4 │ examples Julia 33 2530 919 881
5 │ test Julia 9 585 123 185
6 │ test TOML 1 17 0 1
7 │ docs Julia 1 97 24 27
8 │ docs TOML 1 13 0 1
9 │ docs TeX 1 318 1 59
10 │ docs Markdown 14 0 719 201
11 │ docs Markdown Julia 2 13 0 0
12 │ Project.toml TOML 1 44 0 2
13 │ README.md Markdown 1 0 10 7
I don't see any any sublanguage TOML there. I suspect that is something that would have to be improved in tokei, the program we use to count lines of code, or by switching to a different program.
Pluto notebooks have two strings which contain the toml contents: PLUTO_MANIFEST_TOML_CONTENTS and PLUTO_PROJECT_TOML_CONTENTS. Not sure if tokei can be teached to ignore them. In particular the manifests are quite large and would skew the picture.
As for counting code in additional subdirectories I have no idea how to catch all possible corner cases in an automated way I for example scan the examples subdirectory in runtests.jl
(and passed this pattern to other authors...) .
Here is what came into my mind: Would it make sense to have a configuration file in the repo giving some more info about the package structure and the semantics of some subdirectories ? Something like a possible PackageAnalyzer.toml
:
[TestSubdirs]
test
examples
[SourceSubdirs]
src
assets
[DocSubdirs]
docs
examples
In my case, examples
count twice - they are part of docs (via Literate.jl) and part of tests. And assets
e.g. could contain javascript code.
All output created from this information beyond the standard subdirectories possibly could be marked up as additional info by the package author.
Hm, I think a config could make sense. Do you know if there’s any already existing formats or systems we could use?
My suggestion is just toml :) Parser is in stdlib, syntax is simple and every package author already knows about the format.
Ah right, I got that, I just meant if we could opt-into an existing system for the semantics of it that might be better than inventing our own.
For example, linguist uses git-attributes files to declare certain files are in certain languages when autodetection fails: https://github.com/github/linguist/blob/master/docs/overrides.md. We could use that system as well to add a syntax for declaring certain files belong to certain categories (such a test).
I think using .gitattributes
for this makes sense. Something like
examples/**.jl analyzer-category=test
would mean: all .jl
files in examples
should be assigned the "category" test. This can be coupled with git check-attr
, for example if I have .gitattributes
with
test/**.jl analyzer-category=test
Then in the shell, I can check particular files like
❯ git check-attr analyzer-category test/runtests.jl
test/runtests.jl: analyzer-category: test
❯ git check-attr analyzer-category src/PackageAnalyzer.jl
src/PackageAnalyzer.jl: analyzer-category: unspecified
So then the lines_of_code
table can have an additional column for "category" (maybe w/ some additional logic to determine default category from the directory), and the show
method can use this category to determine what to print for lines of test code vs src code.
The nice thing about using tools like .gitattributes
and git check-attributes
is that they already have a well-understood syntax (basically same as .gitignore
) and tooling that supports nested files and overrides.
E.g. you could have a .gitattributes
at top-level in your package, and then override it with another one in some subfolder somewhere (and it would only override attributes in that subfolder).
Also, some repos might already have a .gitattributes
file, so this would mean they wouldn't need an additional file. We also don't have to document the format ourselves, and can just link out to existing docs.
Interesting - didn't know about this possibility. It seems that this might work well.