Yegor Bugayenko

Results 301 issues of Yegor Bugayenko

At the moment, bibcop-action [fails](https://github.com/yegor256/sqm/actions/runs/8315547655/job/22754521556) due to many style violations in the `main.bib` file. Let's fix them all.

In `report.tex`, let's show the cost of dataset building, in dollars. We can use the time and the number of CPUs, and the amount of memory.

enhancement
help wanted

For each metric, let's make `steps/aggregate.sh` create `data/summary/{metric}.csv` files, which will have the following structure (for example, `data/summary/LOC.csv`): ``` repository,count,sum,average,mean,min,max yegor256/cam,28,500,45.3,48.2,1,90 yegor256/cactoos,... yegor256/takes,... ``` Here: * `28` is the number...

enhancement
help wanted

When packaging all together, let's put CaM sources into it too. This will make the dataset more "reproducible."

enhancement
help wanted
good first issue

It's possible to detect which repository is being actively maintained, for example see this study: https://dl.acm.org/doi/abs/10.1145/3239235.3240501 Let's implement such a filtering (or a similar one) inside `discover-repositories.rb` See this one...

help wanted
good first issue

Similar to #227 Let's filter out repositories that are not being maintained and are not in active development. Maybe [this study](https://dl.acm.org/doi/pdf/10.1145/3239235.3240501) may give a hint how to do this, with...

enhancement
help wanted
good first issue

Let's improve the details of some metrics: * DOER * FOUT * HSD, HSE, HSV * MIDX * NCSS * NULLS Currently, they are very sketchy, which makes it hard...

bug
help wanted
good first issue

There are too many metrics already in the repository, it's hard to read the final report and dataset. Let's introduce "groups". Every metric, when it's generated by a script in...

enhancement
help wanted
good first issue

Currently, we only support Java 8, because we use [javalang](https://github.com/c2nes/javalang) library, which supports Java 8 (three years without any updates). Let's find a way to either replace it or maybe...

enhancement
help wanted
good first issue

Let's filter our the smallest repositories and the largest (by the number of files), maybe using this statistical approach: https://en.wikipedia.org/wiki/Percentile

enhancement
help wanted
good first issue