Yegor Bugayenko
Yegor Bugayenko
At the moment, bibcop-action [fails](https://github.com/yegor256/sqm/actions/runs/8315547655/job/22754521556) due to many style violations in the `main.bib` file. Let's fix them all.
In `report.tex`, let's show the cost of dataset building, in dollars. We can use the time and the number of CPUs, and the amount of memory.
For each metric, let's make `steps/aggregate.sh` create `data/summary/{metric}.csv` files, which will have the following structure (for example, `data/summary/LOC.csv`): ``` repository,count,sum,average,mean,min,max yegor256/cam,28,500,45.3,48.2,1,90 yegor256/cactoos,... yegor256/takes,... ``` Here: * `28` is the number...
When packaging all together, let's put CaM sources into it too. This will make the dataset more "reproducible."
It's possible to detect which repository is being actively maintained, for example see this study: https://dl.acm.org/doi/abs/10.1145/3239235.3240501 Let's implement such a filtering (or a similar one) inside `discover-repositories.rb` See this one...
Similar to #227 Let's filter out repositories that are not being maintained and are not in active development. Maybe [this study](https://dl.acm.org/doi/pdf/10.1145/3239235.3240501) may give a hint how to do this, with...
Let's improve the details of some metrics: * DOER * FOUT * HSD, HSE, HSV * MIDX * NCSS * NULLS Currently, they are very sketchy, which makes it hard...
There are too many metrics already in the repository, it's hard to read the final report and dataset. Let's introduce "groups". Every metric, when it's generated by a script in...
Currently, we only support Java 8, because we use [javalang](https://github.com/c2nes/javalang) library, which supports Java 8 (three years without any updates). Let's find a way to either replace it or maybe...
Let's filter our the smallest repositories and the largest (by the number of files), maybe using this statistical approach: https://en.wikipedia.org/wiki/Percentile