scc Proposal: add a --by-dir option

I often find it useful to list code size by directory; use cases are:

Knowing which parts of a codebase take up how many lines of code, e.g. "70% of the code is in the foo package".
The grand totals can give a skewed picture because it may include things like included libraries, tests, build infrastructure, and so forth that you probably want to exclude, and listing it by directory first is a convenient way to get this overview (aside: a --test flag, similar to --gen might also make sense, especially for environments where tests aren't in separate directories, like Go, but that's a different issue).

Right now there's --by-file, but that's too fine-grained. So I propose to add a new flag:

--by-dir=n    display output for every directory of at most n levels deep.

The parameter works the same as -d for e.g. du; I could add a new --depth parameter, but I think just accepting a value here makes more sense.

I'll work on a patch, but I wanted to make an issue first to discus to prevent working on something that may not get merged.

Thanks!

Feb 06 '21 03:02 arp242

Sounds like a reasonable request. I could see it being pretty useful for those who don't want to use SQL with the new (yet to release SQL option).

If you are going to look into this, you might want to hold off till Go 1.16 because then changes there to the file/path functions look like something that should be integrated in. Certainly its something I want to evaluate. Ill have to try out the new walking/directory methods and compare to whats in scc currently to see if that's the correct path forward though.

Odds are I would merge this though, so long as its done in such a way I can maintain it.

Feb 07 '21 22:02 boyter

I think an even usefuler option would be --by-arg. Then you could do scc --by-arg $(fd -t d --max-depth X) ;-)

Nov 06 '22 19:11 hacker-DOM

Or even scc --by-arg $(fd -t d --exact-depth X)

Nov 06 '22 20:11 hacker-DOM

I may be wrong on this, but isnt it possible to do

scc --by-file `$(fd -t d --exact-depth X)`

I do this from time to time with fzf like so

scc `fzf`

When I want stats for a single file.

Nov 06 '22 22:11 boyter

@boyter I think if you do that, you will get the results by-file anyways (the tool collates all arguments)

I think what we're asking for is for the tool to report each argument separately. So kind of equivalent to doing a shell for loop and running SCC for each directory (but an an order of magnitude faster bc it can cache results).

In fact that is what I'm currently doing. I'm building a tool that uses SCC to add a .metadata.json filé in every directory that holds the number of lines of every nearest child (file/subdir). This can be super useful for tools like LSD, NeoTree, Ranger, or Web interfaces (git forges) to show tho SLoC of every file and directory in the file tree.

As mentioned, I currently have to run SCC on every single directory. This is quite slow (a largish codebase of 0.5M lines takes 3 min on my M1). Even worse, it grows superlinearly, bc larger codebases have more nestedness. (E.g., Linux repo @ 25M LoC is intractable.)

(Alternatively, if you think the idea/ standardI'm trying to advance is a good idea, SCC could create the jsons itself. Happy to work with you on that)

Jan 02 '23 11:01 hacker-DOM

I came by to ask for a similar feature. Specifically, what I want is sub-total reports per directory. That means that in a repository with directories src/ui, src/backend, test/unit, and test/integration there would be report lines for all of those directories individually, then for src and test, and then a top-level report. This should allow for limiting the depth, like the original proposal. This is similar to a GROUP BY ROLLUP as allowed by some SQL engines.

Mar 21 '24 16:03 dyfrgi

Thanks for the comment. This is now back on my list of things to investigate.

Mar 22 '24 05:03 boyter