simplecov
simplecov copied to clipboard
Allow Overall Statistics Calculation without loading the files that are covered
Let's say I have a .resultset.json
file but no source code. This file should be enough to generate some basics stats, like total code coverage, total number of lines, etc.
It is not possible now because SimpleCov refuses to record any data if the source code is not present: https://github.com/colszowka/simplecov/blob/9bb1aa3a6735b16d4026c4c34db0de6973d9a883/lib/simplecov/result.rb#L29
This is unnecessary and very problematic for distributed CI systems. Even if the source code is there, it's even worse for big code bases because to calculate the total coverage, SimpleCov loads every source file: https://github.com/colszowka/simplecov/blob/05d877d1956bed364dfa4018edbc50c213ac94fe/lib/simplecov/source_file.rb#L83-L89 (and other parts of this class).
Is there any technical reason for this that I'm missing?
Hi there,
the major technical reason is that the goal of SimpleCov is to show which lines are covered and which aren't - having some overall statistics is fine but they don't help you much (imo) if you can't have a look which lines are covered and which aren't as you can't go and say "Ok maybe we should test MyClass#critical_method
as the tests seem to be missing it"
I believe this is the main reason this isn't happening - if you want to do a PR fixing this feel welcome but personally I doubt we'll tackle this soon :)
+1 on what Tobi said, in most cases you do indeed need the source to gather actionable results. That being said, since the source files are read lazily (the lines you linked to in source file) as long as you do not actually access the source, you could generate results without hitting the file system. I would suggest you create your own formatter for this that only prints the overall numbers to achieve this.
related: https://github.com/colszowka/simplecov/pull/558#discussion_r103256709
I found that the
files
list wasn't set correctly if I was trying to load and merge the results from another machine.
Probably good for a future pr... but doesn't work with html formatter because it tries to read each file to overlay coverage in report source_file.rb
def src
# We intentionally read source code lazily to
# suppress reading unused source code.
@src ||= File.open(filename, "rb", &:readlines)
end
require 'simplecov'
require 'json'
SimpleCov.coverage_dir 'coveragetest'
SimpleCov.formatters = [SimpleCov::Formatter::SimpleFormatter]
sourcefiles = Dir.glob('./resultset*')
# work around `if File.file?(filename)` when running on other filesystem
add_files = ->(sr) {
files = sr.
original_result.
map {|filename, coverage| SimpleCov::SourceFile.new(filename, coverage) }.
compact.
sort_by(&:filename);
sr.instance_variable_set(:@files, SimpleCov::FileList.new(files));
sr
}
results = sourcefiles.
map { |file| SimpleCov::Result.from_hash(JSON.parse(File.read(file))) }.
each do |result| add_files.(result); end; nil
result = SimpleCov::ResultMerger.merge_results(*results)
add_files.(result)
result.format!
@bf4 What I do now:
repo_dir = "/some/machine/specific/path"
repo_dir_escaped = repo_dir.gsub("/", "\\/")
resultset_glob = File.join(coverage_path, "**", ".resultset.json")
system("sed -i 's/#{repo_dir_escaped}\\///g' #{resultset_glob}")
This makes the paths relative. Then when I merge results on one of the machines I first Dir.chdir
to where the source code is checked out, do SimpleCov.filters.clear
(because I don't want to filter relative paths by SimpleCov.root
) and then run the merging as described in https://github.com/colszowka/simplecov/issues/219#issuecomment-293481328.
This feels quite shaky, but that's another problem which seems to highlight that we might want SimpleCov to have a broader public API instead of relying on its internals (https://github.com/codeclimate/ruby-test-reporter/pull/181 also seems to highlight this).
That being said, since the source files are read lazily (the lines you linked to in source file) as long as you do not actually access the source, you could generate results without hitting the file system.
@colszowka They kind of are, but how does that matter if the loading always needs to happen? As I said, I'd like to know total coverage for a given SHA tested in my CI. Looking at the source code, this seems to be the case:
- I call
SimpleCov::Result#covered_percent
which delegates toSimpleCov::FileList
. -
covered_percent
iscovered_lines / lines_of_code
. -
lines_of_code
iscovered_lines + missed_lines
-
covered_lines
andmissed_lines
(note: these methods are not memoized andcovered_lines
is called twice) go over the entire list of files and callSimpleCov::SourceFile#covered_lines
andSimpleCov::SourceFile#missed_lines
. -
SimpleCov::SourceFile#covered_lines
andSimpleCov::SourceFile#missed_lines
callSimpleCov::SourceFile#lines
which in turn calls#build_lines
. -
SimpleCov::SourceFile#build_lines
calls#src
which loads the entire source file in memory. - This happens for all source files, which means thousands of files for my code base.
Instead of this happening, we could just get those line counts (covered_lines
and missed_lines
) from .resultset.json
without loading anything.
I know that it is useful to show which lines are covered and which not, but that could be done only for files that I request (e.g. files that I know changed in a given commit/PR), without loading every single file just to get a metric of overall code coverage.
It just occurred to me that the reason we are reading all files is because we need to look for the nocov comments to accurately report coverage.
If we were to implement this it'd probably need to be some separate method or what not that explicitly doesn't support the nocov commments, which makes this add quite some complexity. On the other hand, it'd just deal with raw coverage results which might be easier.