using variable inside block of type check leads to "stack level too deep (SystemStackError)"
Describe the Bug
I've written a catch_error with followup type check to =~ Error exactly like written in the bolt docs "catching errors in plans". If I use the variable inside the block of the type checking the plan crashes with a "stack level too deep (SystemStackError)". The idea is to nest the original error for deeper debugging cases.
Expected Behavior
Plan doesn't crash and returns the defined custom error defined in the fail_plan line and is able to include the original error within the new error details hash.
Steps to Reproduce
plan somemodule::test() {
$t = Target.new('some-target')
$group = 'some-nonexistent-group'
$add_or_error = catch_errors(['bolt.inventory/validation-error']) || { $t.add_to_group($group) }
out::message("add_or_error: ${add_or_error}")
if $add_or_error =~ Error {
fail_plan('group not found in inventory', 'inventory/not-found', { 'group' => $group, 'error' => $add_or_error }) # this fails
# fail_plan('group not found in inventory', 'inventory/not-found', { 'group' => $group }) # this works but we are missing the original error
}
return 'it did it!'
}
Environment
- Version 4.0.0 (also happens in 3.30.0)
- Platform Devuan daedalus (debian bookworm)
Additional Context
$ bolt plan run somemodule::test --format json
add_or_error: Error({'msg' => 'Group some-nonexistent-group does not exist in inventory', 'kind' => 'bolt.inventory/validation-error', 'details' => {'path' => []}})
/opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/atomic/thread_local_var.rb:71:in `value': stack level too deep (SystemStackError)
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/context.rb:56:in `lookup'
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet.rb:284:in `lookup'
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/loaders.rb:120:in `implementation_registry'
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/types/type_calculator.rb:536:in `infer_Object'
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/visitor.rb:94:in `visit_this_0'
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/types/type_calculator.rb:270:in `infer'
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/types/type_calculator.rb:289:in `infer_set'
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/types/type_calculator.rb:160:in `infer_set'
... 8080 levels...
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/bolt-4.0.0/lib/bolt/cli.rb:394:in `execute'
from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/bolt-4.0.0/exe/bolt:11:in `<top (required)>'
from /opt/puppetlabs/bolt/bin/bolt:25:in `load'
from /opt/puppetlabs/bolt/bin/bolt:25:in `<main>'
I saw this issue also. ~~From my experience the error key presence in the details hash is caused that, not a variable.~~ Though, I don't remember exactly.
well, I tried with the simple plan and it's fine with the error key, but fails on any Error() object value. The minimal reproducer is below:
plan bolt_3373 {
fail_plan(
'That should fail properly',
'test/failure',
foo => Error('foo'),
)
}
That raises exactly the same issue:
/opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/atomic/thread_local_var.rb:71:in `value': stack level too deep (SystemStackError)
Confirmed on MacOS/aarch64 with Bolt 4.0.0.
I was able to reproduce this bug and did some investigation.
The issue occurs due to an infinite recursion during JSON serialization. Here's how it happens:
-
When
fail_planis called with an Error object in the details hash, it creates aBolt::PlanFailureerror object. -
When this error is processed, the PlanResult class (in plan_result.rb) handles the result. Looking at the PlanResult class, we can see it has a
to_jsonmethod that simply calls@value.to_json(*args). -
The Error class (in error.rb) implements a
to_jsonmethod which callsto_h.to_json(opts). -
The
to_hmethod includes the details hash with'details' => details. -
When an Error object is included inside the details hash, and that Error object gets serialized to JSON:
- The outer Error calls to_json
- Which calls to_h
- Which includes the inner Error in the details hash
- The inner Error then calls to_json
- Which calls to_h
- Which might reference the outer Error again, or simply continue nesting
- This creates an infinite recursion loop
If you agree with the analysis, I can follow up with a PR to resolve this. Please let me know.
(In transparency, I'm personally building a tool that can debug issues. So I've set up the Bolt repo, reproduced the issue, and run my tool. I've traced the code and can confirm the analysis)