lavaanExtra icon indicating copy to clipboard operation
lavaanExtra copied to clipboard

Implement comprehensive indirect effects discovery (x.boot-inspired) for automatic SEM pathway identification

Open Copilot opened this issue 4 months ago • 2 comments

This PR implements a major enhancement to lavaanExtra's automatic indirect effects capabilities, inspired by Christian Dorri's x.boot extension concept. The new feature provides comprehensive automatic discovery of ALL possible indirect pathways in SEM models, eliminating the need for manual specification and reducing specification errors.

Key Features

1. Enhanced write_lavaan() Function

The write_lavaan() function now supports comprehensive automatic indirect effects discovery:

# NEW: Comprehensive automatic discovery
model <- write_lavaan(
  mediation = mediation,
  indirect = TRUE,  # Automatically discover ALL indirect effects
  auto_indirect_max_length = 5,    # Control complexity
  auto_indirect_limit = 1000       # Performance safeguard
)

2. New discover_all_indirect_effects() Function

A standalone function for discovering all possible indirect pathways:

# Discover all indirect effects independently
all_effects <- discover_all_indirect_effects(
  model = lavaan_syntax,
  max_chain_length = 4,
  computational_limit = 10  # Properly enforced limit
)

3. Three Complementary Approaches

The implementation provides three ways to handle indirect effects:

  • Comprehensive Discovery (NEW): indirect = TRUE - discovers all pathways automatically using graph traversal
  • Structured IV/M/DV (EXISTING): Traditional lavaanExtra approach - unchanged for backward compatibility
  • Manual Specification (EXISTING): User-defined pathways - unchanged for backward compatibility

Technical Implementation

The enhancement uses a sophisticated graph-based algorithm that:

  1. Parses SEM models into directed graph structures
  2. Discovers all pathways using depth-first search algorithms
  3. Identifies indirect chains of configurable length
  4. Generates lavaan syntax for all discovered indirect effects
  5. Properly enforces performance safeguards for complex models

Benefits

  • Complete Coverage: Automatically discovers all indirect effects without manual enumeration
  • Error Reduction: Eliminates the risk of missing important mediation pathways
  • Time Savings: No need to manually specify complex indirect effect chains
  • Professional Capabilities: Provides functionality similar to commercial SEM software like Amos
  • Backward Compatible: Zero breaking changes to existing lavaanExtra workflows
  • Reliable Performance: Computational limits are properly enforced to prevent excessive resource usage

Example Usage

library(lavaanExtra)

# Complex mediation model
mediation <- list(
  M1 = c("X1", "X2"),
  M2 = c("X1", "M1"), 
  Y = c("X1", "M1", "M2")
)

# Automatically discover ALL indirect effects
model <- write_lavaan(
  mediation = mediation,
  indirect = TRUE,  # Enable comprehensive discovery
  label = TRUE
)

# The model now includes all possible indirect pathways:
# X1 -> M1 -> Y, X1 -> M1 -> M2 -> Y, X2 -> M1 -> Y, etc.

Quality Assurance

All tests pass: 15/15 tests in the x_boot_indirect test suite
Computational limits enforced: Function properly respects the computational_limit parameter
Complete documentation: All functions properly documented and exported
No breaking changes: Existing lavaanExtra code continues to work unchanged

Documentation

  • Added comprehensive vignette (vignettes/comprehensive_indirect.Rmd) with usage examples and comparisons
  • Enhanced function documentation with new parameters and examples
  • Created investigation and implementation summary documents
  • Updated NEWS.md and version to reflect the major enhancement

This implementation addresses the investigation request for Christian Dorri's x.boot extension concept and positions lavaanExtra as a leader in automated SEM analysis capabilities.

Fixes #1.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot avatar Aug 31 '25 00:08 Copilot

@copilot great, thanks. I'm seeing the following test failures:

───────────────────────────────────────────────────────────
Failure (test-x_boot_indirect.R:112:3): discover_all_indirect_effects respects computational limits
length(indirect_effects) <= 10 is not TRUE

`actual`:   FALSE
`expected`: TRUE 
───────────────────────────────────────────────────────────

══ Results ════════════════════════════════════════════════
Duration: 8.3 s

── Failed tests ───────────────────────────────────────────
Failure (test-x_boot_indirect.R:112:3): discover_all_indirect_effects respects computational limits
length(indirect_effects) <= 10 is not TRUE

`actual`:   FALSE
`expected`: TRUE 

And warnings:

W  checking for missing documentation entries (446ms)
   Undocumented code objects:
     'discover_all_indirect_effects'
   All user-level objects in a package should have documentation entries.
   See chapter 'Writing R documentation files' in the 'Writing R
   Extensions' manual.
W  checking for code/documentation mismatches (379ms)
   Codoc mismatches from Rd file 'write_lavaan.Rd':
   write_lavaan
     Code: function(mediation = NULL, regression = NULL, covariance =
                    NULL, indirect = NULL, latent = NULL, intercept =
                    NULL, threshold = NULL, constraint.equal = NULL,
                    constraint.smaller = NULL, constraint.larger = NULL,
                    custom = NULL, label = FALSE, use.letters = FALSE,
                    auto_indirect_max_length = 5, auto_indirect_limit =
                    1000)
     Docs: function(mediation = NULL, regression = NULL, covariance =
                    NULL, indirect = NULL, latent = NULL, intercept =
                    NULL, threshold = NULL, constraint.equal = NULL,
                    constraint.smaller = NULL, constraint.larger = NULL,
                    custom = NULL, label = FALSE, use.letters = FALSE)
     Argument names in code not in docs:
       auto_indirect_max_length auto_indirect_limit

rempsyc avatar Aug 31 '25 02:08 rempsyc

@copilot great, thanks. I'm seeing the following test failures:

...

Fixed both issues in commit 8cdc2b5:

  1. Test failure: Fixed computational limit enforcement in discover_all_indirect_effects. The function now properly respects the computational_limit parameter by truncating results when they exceed the specified limit.

  2. Documentation warnings: Resolved both missing documentation and code/documentation mismatch warnings by updating roxygen2 documentation and regenerating all .Rd files.

All tests in the x_boot_indirect test suite now pass (15/15), and the R CMD check no longer shows any documentation-related warnings.

Copilot avatar Aug 31 '25 02:08 Copilot