cloc icon indicating copy to clipboard operation
cloc copied to clipboard

Add option to exclude shebang from the count

Open mpsijm opened this issue 3 months ago • 4 comments

Hi! Thanks for maintaining cloc for all these years :smile:

I have been manually changing cloc to exclude the shebang from the count for quite some time now, by commenting out this line: https://github.com/AlDanial/cloc/blob/657038f9b465efaffe7b44c53ad9e8a76694298b/cloc#L7078 Can you explain why cloc considers a shebang as a line of code? And if you think that the shebang should be part of the count by default, can we add an option to exclude it? I can probably write the PR myself, I'm just not sure what the option should be called, because something like --exclude-shebang may also imply that the shebang is not used to detect the language, which is not what I want :slightly_smiling_face:

mpsijm avatar Apr 13 '24 19:04 mpsijm

The reason cloc counts the #! line is because this is a line of code. The invoked interpreter acts on arguments given to it. Here's an example:

#!/usr/bin/perl -n -i
print if /4/;

One line or two? If you only count the print statement you miss the bulk of the logic which in this case is "loop over every line in the input file and perform an in-place edit but do not print the result." Adding print if /4/ means that only lines with the character 4 will be saved.

I'll grant you that few scripts with #! add arguments after the interpeter.

Adding a --exclude-pound-bang switch? I don't know, seems a bit much when we're just talking about one line per script. However you're exactly on the right track with regards to making cloc treat the #! line as a comment. Just comment out the entire "Exception for scripting languages" block. In my copy that would be

 7071  #  chomp( $original_lines[0] );
 7072  #  if (defined $Script_Language{$language} and
 7073  #      $original_lines[0] =~ /^#!/ and
 7074  #      (!scalar(@lines) or ($lines[0] ne $original_lines[0]))) {
 7075  #      unshift @lines, $original_lines[0];  # add the first line back
 7076  #  }

The script language is already identified by this point so no harm done.

AlDanial avatar Apr 19 '24 22:04 AlDanial

Thanks, I learned something today! :smile: I only ever used standard shebangs with things like /bin/bash and /usr/bin/python3, didn't know that you could also pass extra arguments there :slightly_smiling_face:

In my particular use case, I use cloc to find the smallest source file in a folder. All source files solve the same problem, so it's basically a form of code golf. And when the Python files are only a few lines long, adding the shebang line makes a difference :stuck_out_tongue:

Perhaps we can count the shebang line only for languages where command-line parameters matter (like perl), or only when it actually contains arguments? But, it's also totally fair if you would rather not implement this proposal upstream, given the complicated technicalities. In that case, I'll just keep the commenting-out fix on my own machine :slightly_smiling_face: Thanks for verifying that I'm on the right track though :smile:

mpsijm avatar Apr 20 '24 19:04 mpsijm

But it isn't just perl that uses #! arguments. My work colleagues frequently add -u to the python #! line to un-buffer stdout and stderr, ie, #!/usr/bin/env python3 -u.

In any case, regarding code golf, wouldn't the results be the same if all scripts count #! compared to all of them not counting #! ?

AlDanial avatar May 01 '24 02:05 AlDanial

Again, learned something new, thanks! :smile:

For the code golf, we also accept files in non-scripting languages, e.g. C++ and Java, which do not require the shebang.

mpsijm avatar May 01 '24 09:05 mpsijm