Feature/v2/filehistory
Stirling PDF File History Specification
Overview
Stirling PDF implements a comprehensive file history tracking system that embeds metadata directly into PDF documents using the PDF keywords field. This system tracks tool operations, version progression, and file lineage through the processing pipeline.
PDF Metadata Format
Storage Mechanism
File history is stored in the PDF Keywords field as a JSON string with the prefix stirling-history:.
Metadata Structure
interface PDFHistoryMetadata {
stirlingHistory: {
originalFileId: string; // UUID of the root file in the version chain
parentFileId?: string; // UUID of the immediate parent file
versionNumber: number; // Version number (1, 2, 3, etc.)
toolChain: ToolOperation[]; // Array of applied tool operations
formatVersion: '1.0'; // Metadata format version
};
}
interface ToolOperation {
toolName: string; // Tool identifier (e.g., 'compress', 'sanitize')
timestamp: number; // When the tool was applied
parameters?: Record<string, any>; // Tool-specific parameters (optional)
}
Standard PDF Metadata Fields Used
The system uses industry-standard PDF document information fields:
- Creator: Set to "Stirling-PDF" (identifies the application)
- Producer: Set to "Stirling-PDF" (identifies the PDF library/processor)
- Title, Author, Subject, CreationDate: Automatically preserved by pdf-lib during processing
- Keywords: Enhanced with Stirling history data while preserving user keywords
Date Handling Strategy:
- PDF CreationDate: Preserved automatically (document creation date)
- File.lastModified: Source of truth for "when file was last changed" (original upload time or tool processing time)
- No duplication: Single timestamp approach using File.lastModified for all UI displays
Example PDF Document Information
PDF Document Info:
Title: "User Document Title" (preserved from original)
Author: "Document Author" (preserved from original)
Creator: "Stirling-PDF"
Producer: "Stirling-PDF"
CreationDate: "2025-01-01T10:30:00Z" (preserved from original)
Keywords: ["user-keyword", "stirling-history:{\"stirlingHistory\":{\"originalFileId\":\"abc123\",\"versionNumber\":2,\"toolChain\":[{\"toolName\":\"compress\",\"timestamp\":1756825614618},{\"toolName\":\"sanitize\",\"timestamp\":1756825631545}],\"formatVersion\":\"1.0\"}}"]
File System:
lastModified: 1756825631545 (tool processing time - source of truth for "when file was last changed")
Version Numbering System
Version Progression
- v0: Original uploaded file (no Stirling PDF processing)
- v1: First tool applied to original file
- v2: Second tool applied (inherits from v1)
- v3: Third tool applied (inherits from v2)
- etc.
Version Relationships
document.pdf (v0)
↓ compress
document.pdf (v1: compress)
↓ sanitize
document.pdf (v2: compress → sanitize)
↓ ocr
document.pdf (v3: compress → sanitize → ocr)
File Lineage Tracking
Original File ID
The originalFileId remains constant throughout the entire version chain, enabling grouping of all versions of the same logical document.
Parent-Child Relationships
Each processed file references its immediate parent via parentFileId, creating a complete audit trail.
Tool Chain
The toolChain array maintains the complete sequence of tool operations applied to reach the current version.