code-compass icon indicating copy to clipboard operation
code-compass copied to clipboard

Improve filtering of reports by directory, particularly for use with monorepos

Open ambirdsall-gogo opened this issue 4 years ago • 2 comments

First, thank you for an interesting and inspiring project, and for presenting such an effective introduction at emacsconf.

My company's product, a web app, is built on a ten year-old codebase that was written by a much larger team than we have now; it is important for us to not trip over the large amount of code we didn't write ourselves and understand where old code stands in the way of new projects. Code compass looks like a very nice way to measure and communicate these things. However, I'm finding I need to make certain workarounds as I experiment with code-compass in my company's monorepo, and I think I could expand some of these into general improvements. It's worth noting that I am at the beginning of an incremental investigation of the different functions code-compass provides; once I am satisfied with my ability to generate a hotspots visualization, I will move onto other types of report, which may uncover new issues to be solved for my use case.

I find that generating a visualization like (c/show-hotspots path-to-monorepo "3m") visualizes so many files that it causes javascript performance issues in the generated page and makes it hard to drill down on any particular sub-project. I have observed two distinct causes:

  1. the sheer amount of potentially-relevant source code in a ten-year-old monorepo containing many sub-projects is just too big. There is a large monolithic rails app, several backend services, and multiple frontend codebases spanning two major frontend frameworks in both javascript and typescript.
  2. the javascript codebases each have their own node_modules directory of 3rd party library code, and these are not filtered out by default

The second is simpler, so I am starting there: since the locally-cached library code is kept out of version control by .gitignore, I believe the only necessary step is providing an --exclude-dir argument to the cloc executable. For my personal workaround, I simply added the argument to the command string directly, for convenience, but for general consumption, extracting a customizable list like the following seems reasonable:

(defcustom c/exclude-directories
  '("node_modules" "vendor" "tmp")
  "A list of directory patterns to exclude from reports. Contents are passed to the cloc executable via its --exclude-dir argument."
  :group 'code-compass)

(defun c/produce-cloc-report (repository)
  "Create cloc report for REPOSITORY."
  (message "Producing cloc report...")
  (shell-command
   (format "(cd %s; cloc ./ --by-file --csv --quiet --skip-uniqueness --exclude-dir=%s) > cloc.csv" repository (string-join c/exclude-directories ",")))
  repository)

If the basic approach for filtering out low-value files I outlined above looks reasonable to you, I can open a pull request in the coming days; I filled in my current best guess for sane defaults, though I'm sure it could be improved.

The first issue, where I would like to filter reports to one or more specific project directories within a monorepo, does not have as obvious a solution, either in terms of implementation or user interface. It might be possible to hack filtered reports together by abusing the --exclude-dir argument—it may even be possible to build a clean API on top of such a hack—but that seems suboptimal for several reasons (for example, the git operations would still operate over the entire repo, and I'm still in the process of figuring out how this may affect the generated reports). I will continue to think on this problem as I continue my experimentation and learn the workings of code-compass better, and would welcome any suggestions or targeted question you may be able to offer.

ambirdsall-gogo avatar Apr 05 '21 19:04 ambirdsall-gogo

Hi @ambirdsall-gogo , I am really happy you are using code-compass: I made this mode to help users like you! The idea of the directory exclusion is good, I would welcome a PR. One question: are the projects in your monorepo Git repository themselves? If so, you will like the feature I plan to release this week. It lets you do run a hotspot analysis on a directory containing multiple repositories. You can also specify the repositories to pick in a text file, which may do with your filtering needs.

ag91 avatar Apr 06 '21 17:04 ag91

Great, I will put that first PR together soon when I have some free time.

One question: are the projects in your monorepo Git repository themselves?

There is a single git repository for the entire group of projects, each of which is, from git's perspective, just another subdirectory within a single project. I would be surprised if the new feature will work for my use case without any modification, but it has significant overlap with what I'm hoping to do, and I look forward to seeing what I can learn from it. If nothing else, it will be an example of how the API should be designed for the filtering I seek.

Meanwhile, I have to do some more reading of the source code to clarify my understanding of the report-generation process, starting with the input data for the d3 visualizations. If the generated cloc report is what defines the set of circles to be drawn, with the git analyses being queried to determine each circle's fill, then that may be the correct entry point for filtering a git directory to a subset of its source files; but since I am speculating about things I can learn for myself, I will stop there for now.

ambirdsall-gogo avatar Apr 06 '21 20:04 ambirdsall-gogo