it-depends icon indicating copy to clipboard operation
it-depends copied to clipboard

Add feature to detect if a package is unmaintained

Open securingdev opened this issue 1 month ago • 1 comments

Add Package Maintenance Status Checking to SBOM

Summary

This PR adds functionality to check when packages in the Software Bill of Materials (SBOM) were last maintained, helping users identify stale or unmaintained dependencies in their software supply chain.

Motivation

Unmaintained dependencies pose significant security and reliability risks. This feature enables users to:

  • Identify packages that haven't been updated in months or years
  • Make informed decisions about dependency health
  • Proactively address supply chain security concerns
  • Comply with software supply chain security requirements

Key Features

🔍 Maintenance Checking

  • Queries GitHub API to determine last commit date for packages
  • Flags packages as "stale" based on configurable threshold (default: 365 days)
  • Supports npm, pip, cargo, and go package managers
  • Non-blocking: packages without GitHub repos are skipped gracefully

⚡ Performance & Reliability

  • Parallel processing with progress bar feedback
  • 24-hour caching of GitHub metadata to minimize API calls
  • Rate limit tracking and warnings
  • Supports both authenticated (5000 req/hr) and unauthenticated (60 req/hr) GitHub API access

📊 Output Integration

  • JSON: Adds maintenance field to each package with repository URL, last commit date, staleness flag, and days since update
  • CycloneDX: Includes maintenance data as component properties in SBOM

Implementation Details

New Components

  • maintenance.py (380 lines): Core logic with GitHubClient, GitHub URL parser, and parallel maintenance checking
  • MaintenanceInfo class: Data structure for maintenance status with serialization support
  • GitHubMetadataCache table: SQLite caching for GitHub API responses

Modified Components

  • models.py: Extended Package class with maintenance_info field and update_maintenance_info() method
  • config.py: Added 4 new CLI flags (--check-maintenance, --stale-threshold, --github-token, --maintenance-cache-ttl)
  • _cli.py: Integrated maintenance check into main flow after vulnerability audit
  • sbom.py: Extended CycloneDX output with maintenance properties
  • Resolvers (npm, pip, cargo, go): Added get_repository_url() static methods

Architecture

Follows the same pattern as the existing audit.py vulnerability checking:

  • ThreadPoolExecutor for parallel processing
  • Non-breaking opt-in feature via CLI flag
  • Cache-aware with configurable TTL
  • Graceful error handling

Usage Examples

# Basic usage with default threshold (365 days)
it-depends pip:requests --check-maintenance

# Custom staleness threshold (180 days)
it-depends npm:lodash --check-maintenance --stale-threshold 180

# With GitHub authentication for higher rate limits
export GITHUB_TOKEN=your_token_here
it-depends pip:requests --check-maintenance

# Combined with vulnerability audit
it-depends pip:requests --audit --check-maintenance

# Export to CycloneDX SBOM with maintenance data
it-depends pip:requests --check-maintenance --output-format cyclonedx

Example Output

{
  "pip:requests": {
    "2.31.0": {
      "name": "requests",
      "version": "2.31.0",
      "source": "pip",
      "dependencies": {...},
      "vulnerabilities": [],
      "maintenance": {
        "repository_url": "https://github.com/psf/requests",
        "last_commit_date": "2023-05-22T14:30:00Z",
        "is_stale": false,
        "days_since_update": 120,
        "error": null
      }
    }
  }
}

Testing

Unit Tests (test/test_maintenance.py)

  • ✅ GitHub URL extraction (8 test cases covering various URL formats)
  • ✅ MaintenanceInfo serialization and equality
  • ✅ GitHubClient API interactions (mocked)
  • ✅ Package integration
  • ✅ Resolver URL extraction for all 4 package managers

Manual Testing

  • Verified CLI flags appear in --help
  • Confirmed maintenance check runs with progress bar
  • Validated code compiles without errors

Documentation

  • ✅ Comprehensive README section with usage examples
  • ✅ API rate limit guidance
  • ✅ Supported package manager details
  • ✅ Example JSON output

Breaking Changes

None. This is a purely additive feature that:

  • Requires explicit opt-in via --check-maintenance flag
  • Maintains backward compatibility with existing outputs
  • Does not modify existing database schema (adds new optional table)

Files Changed

New Files (2)

  • src/it_depends/maintenance.py (380 lines)
  • test/test_maintenance.py (330 lines)

Modified Files (10)

  • src/it_depends/models.py (+59 lines)
  • src/it_depends/config.py (+24 lines)
  • src/it_depends/_cli.py (+10 lines)
  • src/it_depends/db.py (+12 lines)
  • src/it_depends/sbom.py (+36 lines)
  • src/it_depends/npm.py (+20 lines)
  • src/it_depends/pip.py (+28 lines)
  • src/it_depends/cargo.py (+23 lines)
  • src/it_depends/go.py (+20 lines)
  • README.md (+55 lines)

Total Impact: ~700 lines of new code across 12 files

Checklist

  • [x] Code follows project style guidelines
  • [x] Unit tests added and passing
  • [x] Documentation updated (README)
  • [x] Feature is opt-in and non-breaking
  • [x] Follows existing patterns (audit.py)
  • [x] Error handling implemented
  • [x] CLI help text added
  • [x] Caching implemented for performance

Future Enhancements

Potential follow-up work (not included in this PR):

  • Support for GitLab/Bitbucket repositories
  • Integration with GitHub's archived/deprecated repository status
  • Additional metrics (contributor count, issue response time)
  • HTML visualization of maintenance status

securingdev avatar Nov 24 '25 21:11 securingdev

Thanks for tackling this - identifying unmaintained dependencies is a real security concern and a valuable addition to it-depends.

After reviewing the implementation, I have some suggestions for simplifying the approach. The current PR is ~700 lines across 12 files, but I think we can achieve the same goal with ~150 lines in 2-3 files by following the existing audit.py pattern more closely.

Key suggestions:

1. Follow the audit.py pattern exactly

The vulnerability checking feature (--audit) is self-contained in a single file, doesn't modify resolvers, and doesn't add custom caching. This maintenance feature could work the same way:

# maintenance.py - mirrors audit.py structure
@dataclass
class MaintenanceInfo:
    repo_url: str | None
    last_commit: str | None
    days_since_update: int | None
    is_stale: bool
    error: str | None = None

def check_maintenance(packages, stale_days=365, github_token=None):
    """Enrich packages with maintenance info."""
    ...

2. Don't modify the resolvers

The PR adds get_repository_url() methods to 4 resolver files. Instead, query the registries inline within maintenance.py - the URLs are straightforward to fetch:

  • PyPI: https://pypi.org/pypi/{name}/jsoninfo.project_urls
  • npm: https://registry.npmjs.org/{name}repository.url
  • crates.io: https://crates.io/api/v1/crates/{name}crate.repository
  • Go: parse from package name (already github.com/...)

This keeps the maintenance feature self-contained and doesn't expand the resolver interface.

3. Skip custom caching for now

The GitHubMetadataCache table adds complexity. The --audit feature doesn't cache OSV responses, and that works fine. We can add caching later if GitHub rate limits become a problem in practice.

4. Skip parallelism for v1

Sequential processing is fine for typical dependency trees (<100 packages). The ThreadPoolExecutor + tqdm adds complexity that can be added later if needed.

5. Use a dataclass

The MaintenanceInfo class manually implements __init__, __eq__, __hash__, and to_obj(). A @dataclass with dataclasses.asdict() does this in 6 lines instead of 50.

Summary

Aspect Current PR Suggested
Lines of code ~700 ~150
Files changed 12 2-3
New DB tables 1 0
Resolver changes 4 files 0
Config flags 4 2

The goal is great - let's just slim down the implementation to match the existing patterns in the codebase. Happy to help with a revised approach if that would be useful.

dguido avatar Nov 25 '25 16:11 dguido