codecharta icon indicating copy to clipboard operation
codecharta copied to clipboard

Merge multiple project files into multiple project outputs (MIMO)

Open Richargh opened this issue 5 months ago • 2 comments

Feature request

Improve merge so it is better at handling typical microservice merge cases where we merge multiple files for multiple repos.

Description

As an auditor, I want to merge multiple repositories automatically so that merging multiple microservice cc.jsons is much easier.

Context

Let's say you have an audit where a team has divided their code up into 20 repositories. Then you'll have 20 .sonar.cc.json and 20 .git.cc.json files which you have to merge. Perhaps these files are organized like this:

├📁 sonar  
├─📄 prj1.sonar.cc.json                
├─📄 ...
├─📄 prj20.sonar.cc.json               
├📁 git
├─📄 prj1.git.cc.json
├─📄 ...
├─📄 prj20.git.cc.json
├📁 raw # perhaps you have another folder with raw metrics
├📁 merge # empty right now, but this could be the folder where the merge result would be

Or perhaps these files are organized like this:

├📁 projects  
├─📄 prj1.sonar.cc.json                
├─📄 prj1.git.cc.json
├─📄 ...
├─📄 prj20.sonar.cc.json               
├─📄 prj20.git.cc.json
├📁 merge # empty right now, but this could be the folder where the merge result would be

Note that in both cases the project names match exactly, it just the git/sonar/raw that is different.

I would be great if you could merge these projects via command-line so the result looks like:

├📁 merge
├─📄 prj1.merge.cc.json                
├─📄 ...
├─📄 prj20.merge.cc.json               

Acceptance criteria

  • One new command-line argument exists to merge multiple files with the same name into multiple output files. For example:
    • ccsh merge sonar/ git/ raw/ -mimo MATCH_BY_DOT_PREFIX -o merge/
    • ccsh merge projects/ -mimo MATCH_BY_DOT_PREFIX -o merge/
    • MIMO = Multiple Inputs & Multiple Outputs
    • Removed for now: MATCH_BY_DOT_PREFIX -> in case other matching strategies make sense in the future. We could name the one proposed here MATCH_BY_DOT_PREFIX. If no other strategy is proposed in the future this parameter could be changed to the default one, so it does not have to be specified any more.
  • Files are matched based on their prefix before the "dot".
    • prj1.sonar.cc.json and prj1.git.cc.json and prj1.raw.cc.json would be matched because prj1 are equal.
  • If a file could not be matched to any other file an warning is reported to the error stream.
  • If a file could not be matched, files that have a Levenshtein distance of less than 3 are suggested.
    • This is done so typos are easy to identify.
    • For example mailbox.sonar.cc.json exists and does not have a match. There also exists a malbox.git.cc.json, notice that the i is missing. Then malbox.git.cc.json has a Levenshtein distance of 1 to mailbox.sonar.cc.json and should be suggested as a merge target.
    • It is up to the user to fix file typos

Development notes (optional Task Breakdown)

  • [ ]
  • [ ]
  • [ ]

Open questions

Richargh avatar Sep 16 '24 08:09 Richargh