TorBot
TorBot copied to clipboard
add database feature
Issue #315
Changes Proposed
- New Database Format for --save Flag Added database as a new choice for the --save argument Location: main.py lines 138-139
- Core Database Module Created src/torbot/modules/database.py Implements SearchResultsDatabase class for SQLite management No external database server required (uses built-in sqlite3)
- Integration with LinkTree Added saveDatabase() method in src/torbot/modules/linktree.py (lines 159-195) Extracts all discovered links and metadata for persistent storage
- Query Utilities Created src/torbot/modules/db_query.py for result retrieval Created scripts/query_database.py CLI for database operations
Explanation of Changes
Database Engine & Architecture SQLite (file-based, no server)<project_root>/torbot_search_results.db Auto-initialized on first use
Database Schema searches Table (Search Metadata)
- id (INTEGER PRIMARY KEY): Auto-incrementing search ID
- root_url (TEXT): The root URL that was crawled
- search_timestamp (DATETIME): ISO 8601 formatted timestamp of search
- depth (INTEGER): Crawl depth setting used
- total_links (INTEGER): Count of total links discovered
- links_data (TEXT): JSON array of all link metadata
- created_at (DATETIME): Record creation timestamp
links Table (Individual Link Records)
- id (INTEGER PRIMARY KEY): Auto-incrementing link ID
- search_id (INTEGER): Foreign key referencing searches table
- url (TEXT): Full URL of discovered link
- title (TEXT): Page title or hostname
- status_code (INTEGER): HTTP response code (200, 404, etc.)
- classification (TEXT): Content classification from NLP module
- accuracy (REAL): Classification confidence score (0.0-1.0)
- emails (TEXT): JSON array of emails found on page
- phone_numbers (TEXT): JSON array of phone numbers found
Relationship: One search has many links (1:N relationship with CASCADE delete)
Metadata Captured Per Search
Root-Level Metadata: ✅ Root URL being crawled ✅ Exact timestamp of search (ISO 8601) ✅ Crawl depth configuration ✅ Total link count
Per-Link Metadata: ✅ Full URL ✅ Page title ✅ HTTP status code (connectivity indicator) ✅ Content classification (marketplace, forum, etc.) ✅ Classification accuracy/confidence ✅ Email addresses extracted ✅ Phone numbers extracted
Core Features:
- Save Results -> searchResultsDatabase.save_search_results()->Stores search + links
- Retrieve History -> get_search_history() -> Query with optional URL filter
- Get Details -> get_search_by_id() - Full search details with all links
- Close Connection -> close() -> Proper resource cleanup
Usage Basic Save:
python main.py -u http://example.onion --depth 2 --save database
Benefits:
- Persistence: Search results survive program restarts
- Auditability: Full timestamp history of all crawls
- Queryability: Filter and search previous results
- Scalability: SQLite handles thousands of records efficiently
- No Dependencies: Uses Python's built-in sqlite3 module
- Relationship Integrity: Foreign keys prevent orphaned records
- Export Ready: JSON data format enables easy integration with other tools