CodeTriage
CodeTriage copied to clipboard
Add more languages to doc parsing
https://github.com/codetriage/codetriage/blob/cd43e36d47e5828dcacf79ec18fcbec21b850118/app/models/repo.rb#L22
Love this project - thanks to all the contibutors!
I noticed that (only?) Ruby is supported for the doc parsing. Are there any plans to start adding other languages (e.g. Python?).
I just searched through the issues (both open and closed), didn't see this already mentioned... So hope I don't accidentally get triaged on codetriage 😱 . Happy to try to work on this, with the caveat that I'm primarily a Python person just learning Rails. Happier to let someone else who's better at Rails have a go.
Sorry for the delay in getting back to you.
Funny you mention wanting to get Python on here, the python ecosystem was my original inspiration for the feature. I had a dream that one day Ruby could be as documented as Python was.
I would say in the year(s?) I've been running this experiment the documentation part of the app is not exactly what I had hoped it would be. From an architecture standpoint it takes up a LOT of space in the database even with only a few projects using it, and the doc pages and emails are some of the slowest parts of the app. From a practical aspect, I'm not sure anyone is really modifying docs based on the emails i'm sending them. In the history that this feature has existed 294 people have clicked on a documentation link that I've sent. Granted that's better than zero, but needs to be higher to justify the effort and expense of the feature. I think people need some help in knowing what to add or how to help for docs. I've got some ideas there, but mostly it's around educating existing users.
I'm not opposed to adding support for more languages it's just not been as big of a priority as reigning in the existing docs feature. I actually think adding support for Python could be really interesting since documentation is such a core part of the community. Perhaps Ruby docs just weren't that utilized because there's not an existing community culture that cares about them (as much).
I think we'll need to end up fundamentally shifting how we handle documentation but i'm not sure how.
That's all not really your concern. What I think could work for adding a new language and not overwhelming the system could be to have the documentation be a "beta" feature where repos can apply for it that would help a little with the scaling.
Python parsing
The biggest hurdle for python docs is being able to parse any arbitrary python (2 and 3) project and pull out its documentation. You can see how I've done this with the YARD class here /lib/docs_doctor/parsers/ruby/yard.rb. YARD is a ruby documentation tool.
I'm assuming other documentation tools will have to run in their native languages and then output to some intermediate format such as JSON that could then be read by a Ruby process.
If you're interested in getting Python to move forwards with docs, could you make a python script that given a location to a python project on disk outputs documentation information? Something like:
$ python process_repo_to_json_docs --repo=/path/to/python_project --output=/path/to/outputfile.json
You can see the types of things that I care about for getting information into the database via the YARD class. Mostly class names, method names, line numbers, as well as the comments themselves (I'm calling them "comments").
Ideally, the format we output to would not have to be read into memory all at once but could be processed in a streaming way, for example:
A format like this:
[
{},
{},
{},
#...
]
Is harder to stream in than something like this:
{}
{}
{}
(Where each line is a valid json chunk and they're separated by newline).
Though we can talk more about the format later if you want.
Anywhoo, that's what I'm thinking for now. Any questions? Think you could start to tackle something like that?
Hey - thanks for the detailed response, as well as clear guidance, that's really helpful. I'd love to give it a shot. I probably won't have a branch up for a week or so, as I've got a couple other commitments.
This is an old issue, but I'm also very interested in adding more language to doc parsing. Wondering how we might get this moving for something like PhP or some of the static site docs-as-code repos (like Pantheon's, for just one example: https://github.com/pantheon-systems/documentation).
I'm part of a larger ecosystem of docs maintainers who have a variety of learning resources on how to write good documentation, how to find good docs issues on which to work, and where to go for help. CodeTriage is such a great resource; I'd love to partner with someone here to expand this to include more OSS docs projects.
Going from language support where N=1 to N=2 (or more) is proving to be a barrier. This feature largely works with Ruby right now because the app is already in ruby and I can do things in memory directly. To be able to support other languages I would need some architecture to either get those languages on code triage's production system or introduce some kind of a micro-services approach where we could independently deploy a new app for each supported language.
After launching doc support for Ruby I didn't really see much of a change or an impact from the feature alone. My own personal focus right now has been to write down a comprehensive manual for what to actually do when people get these emails. That project has somewhat snowballed and it looks like it's going to be a full-fledged book. Just to let you know where I'm at.