bolt icon indicating copy to clipboard operation
bolt copied to clipboard

using variable inside block of type check leads to "stack level too deep (SystemStackError)"

Open dionysius opened this issue 11 months ago • 3 comments

Describe the Bug

I've written a catch_error with followup type check to =~ Error exactly like written in the bolt docs "catching errors in plans". If I use the variable inside the block of the type checking the plan crashes with a "stack level too deep (SystemStackError)". The idea is to nest the original error for deeper debugging cases.

Expected Behavior

Plan doesn't crash and returns the defined custom error defined in the fail_plan line and is able to include the original error within the new error details hash.

Steps to Reproduce

plan somemodule::test() {
  $t = Target.new('some-target')
  $group = 'some-nonexistent-group'
  $add_or_error = catch_errors(['bolt.inventory/validation-error']) || { $t.add_to_group($group) }
  out::message("add_or_error: ${add_or_error}")
  if $add_or_error =~ Error {
    fail_plan('group not found in inventory', 'inventory/not-found', { 'group' => $group, 'error' => $add_or_error }) # this fails
    # fail_plan('group not found in inventory', 'inventory/not-found', { 'group' => $group }) # this works but we are missing the original error
  }

  return 'it did it!'
}

Environment

  • Version 4.0.0 (also happens in 3.30.0)
  • Platform Devuan daedalus (debian bookworm)

Additional Context

$ bolt plan run somemodule::test --format json
add_or_error: Error({'msg' => 'Group some-nonexistent-group does not exist in inventory', 'kind' => 'bolt.inventory/validation-error', 'details' => {'path' => []}})
/opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/atomic/thread_local_var.rb:71:in `value': stack level too deep (SystemStackError)
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/context.rb:56:in `lookup'
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet.rb:284:in `lookup'
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/loaders.rb:120:in `implementation_registry'
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/types/type_calculator.rb:536:in `infer_Object'
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/visitor.rb:94:in `visit_this_0'
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/types/type_calculator.rb:270:in `infer'
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/types/type_calculator.rb:289:in `infer_set'
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/puppet-8.10.0/lib/puppet/pops/types/type_calculator.rb:160:in `infer_set'
         ... 8080 levels...
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/bolt-4.0.0/lib/bolt/cli.rb:394:in `execute'
        from /opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/bolt-4.0.0/exe/bolt:11:in `<top (required)>'
        from /opt/puppetlabs/bolt/bin/bolt:25:in `load'
        from /opt/puppetlabs/bolt/bin/bolt:25:in `<main>'

dionysius avatar Jan 22 '25 20:01 dionysius

I saw this issue also. ~~From my experience the error key presence in the details hash is caused that, not a variable.~~ Though, I don't remember exactly.

jay7x avatar Feb 08 '25 06:02 jay7x

well, I tried with the simple plan and it's fine with the error key, but fails on any Error() object value. The minimal reproducer is below:

plan bolt_3373 {
  fail_plan(
    'That should fail properly',
    'test/failure',
    foo => Error('foo'),
  )
}

That raises exactly the same issue:

/opt/puppetlabs/bolt/lib/ruby/gems/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/atomic/thread_local_var.rb:71:in `value': stack level too deep (SystemStackError)

Confirmed on MacOS/aarch64 with Bolt 4.0.0.

jay7x avatar Feb 08 '25 06:02 jay7x

I was able to reproduce this bug and did some investigation.

The issue occurs due to an infinite recursion during JSON serialization. Here's how it happens:

  1. When fail_plan is called with an Error object in the details hash, it creates a Bolt::PlanFailure error object.

  2. When this error is processed, the PlanResult class (in plan_result.rb) handles the result. Looking at the PlanResult class, we can see it has a to_json method that simply calls @value.to_json(*args).

  3. The Error class (in error.rb) implements a to_json method which calls to_h.to_json(opts).

  4. The to_h method includes the details hash with 'details' => details.

  5. When an Error object is included inside the details hash, and that Error object gets serialized to JSON:

    • The outer Error calls to_json
    • Which calls to_h
    • Which includes the inner Error in the details hash
    • The inner Error then calls to_json
    • Which calls to_h
    • Which might reference the outer Error again, or simply continue nesting
    • This creates an infinite recursion loop

If you agree with the analysis, I can follow up with a PR to resolve this. Please let me know.

(In transparency, I'm personally building a tool that can debug issues. So I've set up the Bolt repo, reproduced the issue, and run my tool. I've traced the code and can confirm the analysis)

priyankc avatar May 08 '25 18:05 priyankc