simplecov icon indicating copy to clipboard operation
simplecov copied to clipboard

Allow Overall Statistics Calculation without loading the files that are covered

Open jgonera opened this issue 7 years ago • 5 comments

Let's say I have a .resultset.json file but no source code. This file should be enough to generate some basics stats, like total code coverage, total number of lines, etc.

It is not possible now because SimpleCov refuses to record any data if the source code is not present: https://github.com/colszowka/simplecov/blob/9bb1aa3a6735b16d4026c4c34db0de6973d9a883/lib/simplecov/result.rb#L29

This is unnecessary and very problematic for distributed CI systems. Even if the source code is there, it's even worse for big code bases because to calculate the total coverage, SimpleCov loads every source file: https://github.com/colszowka/simplecov/blob/05d877d1956bed364dfa4018edbc50c213ac94fe/lib/simplecov/source_file.rb#L83-L89 (and other parts of this class).

Is there any technical reason for this that I'm missing?

jgonera avatar Apr 14 '17 23:04 jgonera

Hi there,

the major technical reason is that the goal of SimpleCov is to show which lines are covered and which aren't - having some overall statistics is fine but they don't help you much (imo) if you can't have a look which lines are covered and which aren't as you can't go and say "Ok maybe we should test MyClass#critical_method as the tests seem to be missing it"

I believe this is the main reason this isn't happening - if you want to do a PR fixing this feel welcome but personally I doubt we'll tackle this soon :)

PragTob avatar Apr 15 '17 07:04 PragTob

+1 on what Tobi said, in most cases you do indeed need the source to gather actionable results. That being said, since the source files are read lazily (the lines you linked to in source file) as long as you do not actually access the source, you could generate results without hitting the file system. I would suggest you create your own formatter for this that only prints the overall numbers to achieve this.

colszowka avatar Apr 15 '17 08:04 colszowka

related: https://github.com/colszowka/simplecov/pull/558#discussion_r103256709

I found that the files list wasn't set correctly if I was trying to load and merge the results from another machine.
Probably good for a future pr... but doesn't work with html formatter because it tries to read each file to overlay coverage in report source_file.rb

 def src
      # We intentionally read source code lazily to
      # suppress reading unused source code.
      @src ||= File.open(filename, "rb", &:readlines)
    end
require 'simplecov'
require 'json'
SimpleCov.coverage_dir 'coveragetest'
SimpleCov.formatters = [SimpleCov::Formatter::SimpleFormatter]
sourcefiles = Dir.glob('./resultset*') 
# work around  `if File.file?(filename)` when running on other filesystem
add_files = ->(sr) { 
  files = sr.
    original_result.
    map {|filename, coverage| SimpleCov::SourceFile.new(filename, coverage) }.
    compact.
    sort_by(&:filename);
  sr.instance_variable_set(:@files, SimpleCov::FileList.new(files));
  sr 
}

results = sourcefiles.
  map { |file| SimpleCov::Result.from_hash(JSON.parse(File.read(file))) }.
  each do |result| add_files.(result); end; nil
result = SimpleCov::ResultMerger.merge_results(*results)
add_files.(result)
result.format!

bf4 avatar Apr 15 '17 20:04 bf4

@bf4 What I do now:

repo_dir = "/some/machine/specific/path"
repo_dir_escaped = repo_dir.gsub("/", "\\/")
resultset_glob = File.join(coverage_path, "**", ".resultset.json")
system("sed -i 's/#{repo_dir_escaped}\\///g' #{resultset_glob}")

This makes the paths relative. Then when I merge results on one of the machines I first Dir.chdir to where the source code is checked out, do SimpleCov.filters.clear (because I don't want to filter relative paths by SimpleCov.root) and then run the merging as described in https://github.com/colszowka/simplecov/issues/219#issuecomment-293481328.

This feels quite shaky, but that's another problem which seems to highlight that we might want SimpleCov to have a broader public API instead of relying on its internals (https://github.com/codeclimate/ruby-test-reporter/pull/181 also seems to highlight this).

That being said, since the source files are read lazily (the lines you linked to in source file) as long as you do not actually access the source, you could generate results without hitting the file system.

@colszowka They kind of are, but how does that matter if the loading always needs to happen? As I said, I'd like to know total coverage for a given SHA tested in my CI. Looking at the source code, this seems to be the case:

Instead of this happening, we could just get those line counts (covered_lines and missed_lines) from .resultset.json without loading anything.

I know that it is useful to show which lines are covered and which not, but that could be done only for files that I request (e.g. files that I know changed in a given commit/PR), without loading every single file just to get a metric of overall code coverage.

jgonera avatar Apr 17 '17 22:04 jgonera

It just occurred to me that the reason we are reading all files is because we need to look for the nocov comments to accurately report coverage.

If we were to implement this it'd probably need to be some separate method or what not that explicitly doesn't support the nocov commments, which makes this add quite some complexity. On the other hand, it'd just deal with raw coverage results which might be easier.

PragTob avatar Dec 09 '19 10:12 PragTob