rspec-benchmark
rspec-benchmark copied to clipboard
Failure message appears to contradict itself
Describe the problem
When executing a spec using the power matcher I receive an error that appears to be contradictory.
Steps to reproduce the problem
Create a spec that uses the perform_power matcher.
Your code here to reproduce the issue
it "tests complexity" do
expect{request}.to perform_power
end
Actual behaviour
What happened? This could be a description, log output, error raised etc.
Failure/Error: expect{request}.to perform_power
expected block to perform power, but performed power
Expected behaviour
What did you expect to happen? A passing test or a failing test stating the request did not perform power
Describe your environment
- OS version: macOS Big Sur 11.6
- Ruby version: 2.7.4
- RSpec::Benchmark version:
rspec-benchmark (0.6.0)
Hi David,
Thank you for using rspec-benchmark and reporting this issue.
Would you be able to provide a minimal reproduction test case?
Ok, I've spent some time investigating the reasons behind this nonsensical error message.
When assessing whether the expectation matches, two things are taken into account:
- the fitness type e.i. logarithmic
- the quality of the fit e.i. threshold - how well does the function approximate the observed trend.
A fit quality threshold is a number between 0 and 1. Values above 0.9 mean that the fit is very good which is the default. This value can be changed globally or per test. For example, to lower it to 0.8 you can do:
it "tests complexity" do
expect { request }.to perform_power.threshold(0.8)
end
So the message expected block to perform power, but performed power means that the fit quality was below the 0.9.
Why is this even taken into account? A poor fit quality means that the range of values is hard to approximate to any trend line and the given complexity is only the best estimate. It would be better to 'improve' the test to get a more definite approximation and thus gain confidence about the measured complexity.
Now, this is not ideal, and can be resolved in two ways:
- Expand the error message to include fit quality. For example,
expected block to perform power above 0.9 fit quality, but performed power at 0.87 fit quality
- Removed the threshold from the equation and only compare the trend line.
I'm reluctant to go the route of removing the threshold because such tests may become very brittle. With low threshold values, the trend line can be hard to estimate and change with each test run. We want high confidence. Hence, I'm more inclined to improve the message and 'educate' about this parameter. Any thoughts?