htmlSanityCheck icon indicating copy to clipboard operation
htmlSanityCheck copied to clipboard

Make file suffixes configurable for HTML document discovery

Open ascheman opened this issue 2 months ago • 0 comments

Background

Currently, file suffix handling is inconsistent across the project:

  • Gradle plugin: only searches for .html files
  • Maven plugin: searches for .html and .htm files
  • CLI: searches for .html and .htm files (configurable via --suffix option)

The logic for discovering source documents is duplicated in each plugin/CLI implementation.

Proposed Enhancement

Make file suffixes a core configuration feature with centralized document discovery logic in AllChecksRunner.

Requirements

  1. Core Configuration: Add a suffixes attribute to Configuration

    • Type: Set<String>
    • Default: ["html", "htm"]
    • Represents file extensions to search for (without the dot)
  2. Document Discovery in AllChecksRunner:

    • During AllChecksRunner initialization (constructor or performAllChecks())
    • If sourceDocuments is null or empty AND sourceDir is set:
      • Recursively scan sourceDir for files matching configured suffixes
      • Auto-populate the effective source documents for checking
    • If sourceDocuments is explicitly provided: use as-is (no auto-discovery)
  3. Plugin/CLI Updates:

    • Gradle plugin: Remove local file scanning (setSourceDir() logic), delegate to core
    • Maven plugin: Remove findHtmlFiles() method, delegate to core
    • CLI: Continue to support --suffix option, pass to core Configuration.suffixes

Implementation Details

Location: Document discovery logic in AllChecksRunner class

  • Keeps Configuration as a pure data holder
  • Runner already orchestrates the checking process
  • Natural place for auto-discovery before checks begin
  • Example approach:
    public class AllChecksRunner {
        private Set<File> effectiveSourceDocuments;
    
        public AllChecksRunner(Configuration config) {
            this.effectiveSourceDocuments = config.getSourceDocuments();
    
            if ((effectiveSourceDocuments == null || effectiveSourceDocuments.isEmpty()) 
                && config.getSourceDir() != null) {
                this.effectiveSourceDocuments = discoverSourceDocuments(
                    config.getSourceDir(), 
                    config.getSuffixes()
                );
            }
        }
    
        private Set<File> discoverSourceDocuments(File sourceDir, Set<String> suffixes) {
            // Recursive file scanning logic here
        }
    }
    

Benefits

  • ✅ Consistent behavior across all interfaces (CLI, Maven, Gradle)
  • ✅ Single source of truth for file discovery logic
  • ✅ User-configurable suffixes (e.g., for .xhtml, .htm5)
  • ✅ Reduced code duplication
  • ✅ Easier to test and maintain
  • ✅ Clean separation: Configuration = data, AllChecksRunner = execution logic

Migration Path

  1. Add suffixes to Configuration with default ["html", "htm"]
  2. Implement discovery logic in AllChecksRunner
  3. Update plugins/CLI to remove local scanning and rely on core
  4. Deprecate plugin-specific file discovery mechanisms
  5. Update documentation

ascheman avatar Oct 29 '25 14:10 ascheman