cloc icon indicating copy to clipboard operation
cloc copied to clipboard

Comments VS documentation

Open h-2 opened this issue 6 years ago • 5 comments

Is there some way to differentiate between comments and (doxygen-like) documentation?

Typically code comments are

// foo
/* foo */

While doxygen and almost everything else uses the following:

/// foo
//! bar

/** foo */
/*! bar */

Being able to get two different metrics would be extremely helpful in assessing "quality of documentation"...

Thanks!

h-2 avatar Aug 06 '19 13:08 h-2

Unfortunately cloc's architecture would need a substantial re-write to identify documentation separately from comments. The closest option would be a new switch resembling --docstring-as-code to count documentation comments as code--but only for /// markers, not /*! .. */ because the ending marker is the same as for a regular comment, makes parsing easy to get wrong.

AlDanial avatar Aug 09 '19 15:08 AlDanial

Is it possible to tell cloc to stop parsing a file when it reaches a line matching a regex? It's fairly common in Perl to put the POD (documentation) after the __END__, which cloc reports as comments, but I would like it to be excluded.

CandyAngel avatar Sep 18 '19 12:09 CandyAngel

@CandyAngel : it is possible, and I can understand why stuff after __END__ might not be comments. I need to come up with a reasonable switch name to implement this. Any ideas? --perl-ignore-data is what comes to mind.

AlDanial avatar Sep 19 '19 04:09 AlDanial

@AlDanial I've only had a quick look through the code, so please forgive any misconceptions below!

--ignore-trailing=<regex> seems pretty reasonable and flexible if there are other languages with "end of code" markers. It could be done in read_file() (and around line 5540, which seems to be another code path for file reading?).

It might be more useful as --ignore-trailing=<language>=<regex> for multi-language examinations but there isn't a tidy point to add that as far as I can tell.

Alternatively, trailing could even be another line classification as that can be language specific like comments are (in Perl's case, any lines following __DATA__ or __END__).

--perl-ignore-data would be the only language-specific option, so I'd imagine you want to avoid that for the more general implementations? If you do want to add language specific options, might I suggest "namespacing" them e.g. prefix with --ls-<language>- (for language specific, of course :P). Could get pretty messy otherwise!

CandyAngel avatar Sep 19 '19 08:09 CandyAngel

@CandyAngel Yes, the place to make the mod is sub read_file. The user interface, as you hinted at, is the tricky part. I'm thinking along the lines of your suggestion --ignore-trailing=<language>=<regex>.

To get the ball rolling, please open a new ticket as I don't want commits to be conflated with the original VS doc issue (feel free to cut/paste posts from here to there). Unfortunately for the VS docs issue, it is beyond the scope of what cloc is architected to handle.

AlDanial avatar Sep 21 '19 17:09 AlDanial