Add feature to detect if a package is unmaintained
Add Package Maintenance Status Checking to SBOM
Summary
This PR adds functionality to check when packages in the Software Bill of Materials (SBOM) were last maintained, helping users identify stale or unmaintained dependencies in their software supply chain.
Motivation
Unmaintained dependencies pose significant security and reliability risks. This feature enables users to:
- Identify packages that haven't been updated in months or years
- Make informed decisions about dependency health
- Proactively address supply chain security concerns
- Comply with software supply chain security requirements
Key Features
🔍 Maintenance Checking
- Queries GitHub API to determine last commit date for packages
- Flags packages as "stale" based on configurable threshold (default: 365 days)
- Supports npm, pip, cargo, and go package managers
- Non-blocking: packages without GitHub repos are skipped gracefully
⚡ Performance & Reliability
- Parallel processing with progress bar feedback
- 24-hour caching of GitHub metadata to minimize API calls
- Rate limit tracking and warnings
- Supports both authenticated (5000 req/hr) and unauthenticated (60 req/hr) GitHub API access
📊 Output Integration
- JSON: Adds
maintenancefield to each package with repository URL, last commit date, staleness flag, and days since update - CycloneDX: Includes maintenance data as component properties in SBOM
Implementation Details
New Components
maintenance.py(380 lines): Core logic withGitHubClient, GitHub URL parser, and parallel maintenance checkingMaintenanceInfoclass: Data structure for maintenance status with serialization supportGitHubMetadataCachetable: SQLite caching for GitHub API responses
Modified Components
models.py: ExtendedPackageclass withmaintenance_infofield andupdate_maintenance_info()methodconfig.py: Added 4 new CLI flags (--check-maintenance,--stale-threshold,--github-token,--maintenance-cache-ttl)_cli.py: Integrated maintenance check into main flow after vulnerability auditsbom.py: Extended CycloneDX output with maintenance properties- Resolvers (npm, pip, cargo, go): Added
get_repository_url()static methods
Architecture
Follows the same pattern as the existing audit.py vulnerability checking:
- ThreadPoolExecutor for parallel processing
- Non-breaking opt-in feature via CLI flag
- Cache-aware with configurable TTL
- Graceful error handling
Usage Examples
# Basic usage with default threshold (365 days)
it-depends pip:requests --check-maintenance
# Custom staleness threshold (180 days)
it-depends npm:lodash --check-maintenance --stale-threshold 180
# With GitHub authentication for higher rate limits
export GITHUB_TOKEN=your_token_here
it-depends pip:requests --check-maintenance
# Combined with vulnerability audit
it-depends pip:requests --audit --check-maintenance
# Export to CycloneDX SBOM with maintenance data
it-depends pip:requests --check-maintenance --output-format cyclonedx
Example Output
{
"pip:requests": {
"2.31.0": {
"name": "requests",
"version": "2.31.0",
"source": "pip",
"dependencies": {...},
"vulnerabilities": [],
"maintenance": {
"repository_url": "https://github.com/psf/requests",
"last_commit_date": "2023-05-22T14:30:00Z",
"is_stale": false,
"days_since_update": 120,
"error": null
}
}
}
}
Testing
Unit Tests (test/test_maintenance.py)
- ✅ GitHub URL extraction (8 test cases covering various URL formats)
- ✅ MaintenanceInfo serialization and equality
- ✅ GitHubClient API interactions (mocked)
- ✅ Package integration
- ✅ Resolver URL extraction for all 4 package managers
Manual Testing
- Verified CLI flags appear in
--help - Confirmed maintenance check runs with progress bar
- Validated code compiles without errors
Documentation
- ✅ Comprehensive README section with usage examples
- ✅ API rate limit guidance
- ✅ Supported package manager details
- ✅ Example JSON output
Breaking Changes
None. This is a purely additive feature that:
- Requires explicit opt-in via
--check-maintenanceflag - Maintains backward compatibility with existing outputs
- Does not modify existing database schema (adds new optional table)
Files Changed
New Files (2)
src/it_depends/maintenance.py(380 lines)test/test_maintenance.py(330 lines)
Modified Files (10)
src/it_depends/models.py(+59 lines)src/it_depends/config.py(+24 lines)src/it_depends/_cli.py(+10 lines)src/it_depends/db.py(+12 lines)src/it_depends/sbom.py(+36 lines)src/it_depends/npm.py(+20 lines)src/it_depends/pip.py(+28 lines)src/it_depends/cargo.py(+23 lines)src/it_depends/go.py(+20 lines)README.md(+55 lines)
Total Impact: ~700 lines of new code across 12 files
Checklist
- [x] Code follows project style guidelines
- [x] Unit tests added and passing
- [x] Documentation updated (README)
- [x] Feature is opt-in and non-breaking
- [x] Follows existing patterns (audit.py)
- [x] Error handling implemented
- [x] CLI help text added
- [x] Caching implemented for performance
Future Enhancements
Potential follow-up work (not included in this PR):
- Support for GitLab/Bitbucket repositories
- Integration with GitHub's archived/deprecated repository status
- Additional metrics (contributor count, issue response time)
- HTML visualization of maintenance status
Thanks for tackling this - identifying unmaintained dependencies is a real security concern and a valuable addition to it-depends.
After reviewing the implementation, I have some suggestions for simplifying the approach. The current PR is ~700 lines across 12 files, but I think we can achieve the same goal with ~150 lines in 2-3 files by following the existing audit.py pattern more closely.
Key suggestions:
1. Follow the audit.py pattern exactly
The vulnerability checking feature (--audit) is self-contained in a single file, doesn't modify resolvers, and doesn't add custom caching. This maintenance feature could work the same way:
# maintenance.py - mirrors audit.py structure
@dataclass
class MaintenanceInfo:
repo_url: str | None
last_commit: str | None
days_since_update: int | None
is_stale: bool
error: str | None = None
def check_maintenance(packages, stale_days=365, github_token=None):
"""Enrich packages with maintenance info."""
...
2. Don't modify the resolvers
The PR adds get_repository_url() methods to 4 resolver files. Instead, query the registries inline within maintenance.py - the URLs are straightforward to fetch:
- PyPI:
https://pypi.org/pypi/{name}/json→info.project_urls - npm:
https://registry.npmjs.org/{name}→repository.url - crates.io:
https://crates.io/api/v1/crates/{name}→crate.repository - Go: parse from package name (already
github.com/...)
This keeps the maintenance feature self-contained and doesn't expand the resolver interface.
3. Skip custom caching for now
The GitHubMetadataCache table adds complexity. The --audit feature doesn't cache OSV responses, and that works fine. We can add caching later if GitHub rate limits become a problem in practice.
4. Skip parallelism for v1
Sequential processing is fine for typical dependency trees (<100 packages). The ThreadPoolExecutor + tqdm adds complexity that can be added later if needed.
5. Use a dataclass
The MaintenanceInfo class manually implements __init__, __eq__, __hash__, and to_obj(). A @dataclass with dataclasses.asdict() does this in 6 lines instead of 50.
Summary
| Aspect | Current PR | Suggested |
|---|---|---|
| Lines of code | ~700 | ~150 |
| Files changed | 12 | 2-3 |
| New DB tables | 1 | 0 |
| Resolver changes | 4 files | 0 |
| Config flags | 4 | 2 |
The goal is great - let's just slim down the implementation to match the existing patterns in the codebase. Happy to help with a revised approach if that would be useful.