htmlSanityCheck
htmlSanityCheck copied to clipboard
Make file suffixes configurable for HTML document discovery
Background
Currently, file suffix handling is inconsistent across the project:
- Gradle plugin: only searches for
.htmlfiles - Maven plugin: searches for
.htmland.htmfiles - CLI: searches for
.htmland.htmfiles (configurable via--suffixoption)
The logic for discovering source documents is duplicated in each plugin/CLI implementation.
Proposed Enhancement
Make file suffixes a core configuration feature with centralized document discovery logic in AllChecksRunner.
Requirements
-
Core Configuration: Add a
suffixesattribute toConfiguration- Type:
Set<String> - Default:
["html", "htm"] - Represents file extensions to search for (without the dot)
- Type:
-
Document Discovery in AllChecksRunner:
- During
AllChecksRunnerinitialization (constructor orperformAllChecks()) - If
sourceDocumentsis null or empty ANDsourceDiris set:- Recursively scan
sourceDirfor files matching configuredsuffixes - Auto-populate the effective source documents for checking
- Recursively scan
- If
sourceDocumentsis explicitly provided: use as-is (no auto-discovery)
- During
-
Plugin/CLI Updates:
- Gradle plugin: Remove local file scanning (
setSourceDir()logic), delegate to core - Maven plugin: Remove
findHtmlFiles()method, delegate to core - CLI: Continue to support
--suffixoption, pass to coreConfiguration.suffixes
- Gradle plugin: Remove local file scanning (
Implementation Details
Location: Document discovery logic in AllChecksRunner class
- Keeps
Configurationas a pure data holder - Runner already orchestrates the checking process
- Natural place for auto-discovery before checks begin
- Example approach:
public class AllChecksRunner { private Set<File> effectiveSourceDocuments; public AllChecksRunner(Configuration config) { this.effectiveSourceDocuments = config.getSourceDocuments(); if ((effectiveSourceDocuments == null || effectiveSourceDocuments.isEmpty()) && config.getSourceDir() != null) { this.effectiveSourceDocuments = discoverSourceDocuments( config.getSourceDir(), config.getSuffixes() ); } } private Set<File> discoverSourceDocuments(File sourceDir, Set<String> suffixes) { // Recursive file scanning logic here } }
Benefits
- ✅ Consistent behavior across all interfaces (CLI, Maven, Gradle)
- ✅ Single source of truth for file discovery logic
- ✅ User-configurable suffixes (e.g., for
.xhtml,.htm5) - ✅ Reduced code duplication
- ✅ Easier to test and maintain
- ✅ Clean separation: Configuration = data, AllChecksRunner = execution logic
Migration Path
- Add
suffixestoConfigurationwith default["html", "htm"] - Implement discovery logic in
AllChecksRunner - Update plugins/CLI to remove local scanning and rely on core
- Deprecate plugin-specific file discovery mechanisms
- Update documentation