User description
This PR removes hardcoded preprocessing values and replaces them with a centralized configuration system.
✅ Summary of Changes
Introduced a YAML-based configuration file: config/default_config.yaml
Added a new config loader module: config/config_loader.py
Updated core processing modules to use config values instead of hardcoded constants:
src/constants/image_processing.py
src/defaults/config.py
src/processors/CropPage.py
src/processors/CropOnMarkers.py
Added safe fallbacks to ensure the system works even if config keys are missing
Improved flexibility by allowing configuration overrides without modifying code
🧠 Why This Change?
Previously, threshold values, kernel sizes, and other preprocessing parameters were hardcoded across the codebase, making it difficult to:
Tune OMR performance for different scanning qualities
Adjust thresholds for different sheet templates
Adapt the system for diverse lighting or contrast environments
This PR solves that by enabling external config-driven control, improving maintainability and user flexibility.
PR Type
Enhancement, Refactoring
Description
-
Introduced YAML-based configuration system with centralized config loader
-
Migrated hardcoded preprocessing values to external config file
-
Updated image processing modules to use config values with fallbacks
-
Simplified CropPage and CropOnMarkers classes to use config-driven parameters
Diagram Walkthrough
flowchart LR
A["config/default_config.yaml"] -->|load_config| B["config/config_loader.py"]
B -->|provides values| C["src/constants/image_processing.py"]
C -->|used by| D["src/processors/CropPage.py"]
C -->|used by| E["src/processors/CropOnMarkers.py"]
B -->|provides values| F["src/defaults/config.py"]
File Walkthrough
| Relevant files |
|---|
| Enhancement |
config_loader.pyNew config loader module for YAML parsing
config/config_loader.py
- New module that loads YAML configuration files
- Implements safe file existence check with error handling
- Returns parsed YAML as dictionary for use across codebase
|
+13/-0 |
image_processing.pyMigrate constants to config-driven values
src/constants/image_processing.py
- Integrated config loader to read preprocessing values from YAML
- Replaced hardcoded threshold values with config-driven values
- Updated MIN_PAGE_AREA_THRESHOLD, MAX_COSINE_THRESHOLD,
PAGE_THRESHOLD_PARAMS - Updated CANNY_PARAMS, DEFAULT_MAX_FEATURES, DEFAULT_GOOD_MATCH_PERCENT
with config fallbacks
|
+18/-9 |
|
| Configuration changes |
default_config.yamlNew centralized YAML configuration file
config/default_config.yaml
- New YAML configuration file with preprocessing parameters
- Defines thresholds, kernel sizes, and image processing values
- Includes paths, logging, OMR, and CLI configuration sections
- Replaces hardcoded constants with externally configurable values
|
+28/-0 |
|
| Refactoring |
config.pyRefactor defaults config with YAML integration
src/defaults/config.py
- Replaced DotMap-based configuration with dictionary-based approach
- Integrated config loader to read from YAML file
- Restructured DEFAULT_CONFIG to include threshold_params, paths, omr,
logging, and cli sections - Added fallback values for all config keys to ensure robustness
|
+36/-33 |
CropOnMarkers.pyRefactor CropOnMarkers with config integration
src/processors/CropOnMarkers.py
- Simplified class structure by removing inheritance from
ImagePreprocessor - Integrated config loader for marker matching and preprocessing
parameters - Refactored to use config-based threshold values and blur kernel
settings - Added new methods for image preprocessing, marker matching, and
threshold computation - Removed complex marker loading and template matching logic
|
+118/-240 |
CropPage.pyRefactor CropPage with config-driven parameters
src/processors/CropPage.py
- Simplified class structure by removing inheritance from
ImagePreprocessor - Integrated config loader for threshold and Canny edge detection
parameters - Refactored to use config-based values with fallbacks to constants
- Added dedicated methods for threshold application, edge detection, and
contour validation - Removed complex image utility dependencies and logging calls
|
+63/-110 |
|
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
| Security Compliance |
| 🔴 |
Unhandled config loading failure
Description: The config is loaded at module import time without error handling, which will cause the entire application to crash if the config file is missing or malformed, creating a denial of service vulnerability. image_processing.py [8-8]
Referred Code
config = load_config()
|
Module-level config loading crash
Description: The config is loaded at module import time in a processor class without error handling, causing application crashes if the config file is unavailable during import, leading to denial of service. CropOnMarkers.py [5-5]
Referred Code
config = load_config()
|
Import-time config failure
Description: The config is loaded at module import time without error handling, causing the application to fail during import if the config file is missing or corrupted, resulting in denial of service. CropPage.py [7-7]
Referred Code
config = load_config()
|
| ⚪ |
Unsafe YAML deserialization
Description: The YAML file is loaded without any validation or sanitization, allowing arbitrary Python code execution through YAML deserialization attacks if an attacker can control the config file contents. config_loader.py [12-13]
Referred Code
with open(config_path, "r") as file:
return yaml.safe_load(file)
|
| Ticket Compliance |
| ⚪ | 🎫 No ticket provided
|
| Codebase Duplication Compliance |
| ⚪ | Codebase context is not defined
Follow the guide to enable codebase context checks.
|
| Custom Compliance |
| 🟢 |
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance.
Status: Passed
|
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting
Status: Passed
|
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data.
Status: Passed
|
| 🔴 |
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation
Status: Missing YAML error handling: The yaml.safe_load() call does not handle potential YAML parsing errors, which could cause the application to crash with unclear error messages if the config file is malformed.
Referred Code
with open(config_path, "r") as file:
return yaml.safe_load(file)
|
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging.
Status: Path exposure in error: The FileNotFoundError at line 10 exposes the full file system path in the error message, which could reveal internal directory structure to end users.
Referred Code
if not os.path.exists(config_path):
raise FileNotFoundError(f"Config file not found at {config_path}")
|
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities
Status: Missing path validation: The config_path parameter is not validated for path traversal attacks, allowing potential access to arbitrary files outside the intended config directory.
Referred Code
def load_config(config_path="config/default_config.yaml"):
"""
Load YAML config file and return it as a dictionary.
Raises an error if the config file is missing.
"""
if not os.path.exists(config_path):
raise FileNotFoundError(f"Config file not found at {config_path}")
|
|
|
Compliance status legend
🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
PR Code Suggestions ✨
Explore these optional code suggestions:
| Category | Suggestion | Impact |
| High-level |
PR scope exceeds configuration refactoring
The PR contains a major, undocumented rewrite of the CropPage and CropOnMarkers processors. This includes removing their ImagePreprocessor base class, which is a breaking architectural change that should be in a separate PR.
Examples:
src/processors/CropOnMarkers.py [14-131]
class CropOnMarkers:
def __init__(self, marker_ops=None):
# Load config or fallback to defaults
self.min_matching_threshold = marker_ops.get(
"min_matching_threshold",
config["omr"].get("bubble_threshold", 0.3)
) if marker_ops else config["omr"].get("bubble_threshold", 0.3)
self.gaussian_blur_kernel = config["preprocessing"].get(
"blur_kernel",
... (clipped 108 lines)
src/processors/CropPage.py [17-76]
class CropPage:
def __init__(self):
# Load values from config with fallback to constants
self.min_page_area_threshold = config["preprocessing"].get("min_page_area_threshold", MIN_PAGE_AREA_THRESHOLD)
self.max_cosine_threshold = config["preprocessing"].get("max_cosine_threshold", MAX_COSINE_THRESHOLD)
self.page_threshold_params = {
"thresh": config["preprocessing"].get("threshold", PAGE_THRESHOLD_PARAMS["threshold_value"]),
"maxval": PAGE_THRESHOLD_PARAMS["max_pixel_value"],
}
... (clipped 50 lines)
Solution Walkthrough:
Before:
# src/processors/CropOnMarkers.py
class CropOnMarkers(ImagePreprocessor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
...
def apply_filter(self, image, file_path):
...
# Main processing logic is here
return image
# src/processors/CropPage.py
class CropPage(ImagePreprocessor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
...
def apply_filter(self, image, file_path):
# Main processing logic is here
return image
After:
# src/processors/CropOnMarkers.py
class CropOnMarkers: # No longer inherits from ImagePreprocessor
def __init__(self, marker_ops=None):
# New initialization logic
...
def process(self, image, reference_markers):
# Completely new processing logic
...
return { ... }
# src/processors/CropPage.py
class CropPage: # No longer inherits from ImagePreprocessor
def __init__(self):
# New initialization logic
...
# Note: The main processing method like `apply_filter` is removed
# and not replaced with a clear entry point.
Suggestion importance[1-10]: 10
__
Why: This suggestion correctly identifies a critical issue where the PR's scope is much larger than described, including an undocumented rewrite and a breaking architectural change by removing the ImagePreprocessor base class from CropPage and CropOnMarkers.
| High
|
| Possible issue |
Fix incorrect contour validation logic
Fix a bug in is_valid_contour by returning False if a contour does not approximate to a quadrilateral (len(approx) != 4), preventing incorrect validation.
src/processors/CropPage.py [55-76]
def is_valid_contour(self, contour):
""" Validate contour based on config thresholds """
if cv2.contourArea(contour) < self.min_page_area_threshold:
return False
# Calculate max cosine for contour
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.025 * peri, True)
+ if len(approx) != 4:
+ return False
+
max_cosine = 0
- if len(approx) == 4:
- for i in range(4):
- p1 = approx[i][0]
- p2 = approx[(i + 1) % 4][0]
- p3 = approx[(i + 2) % 4][0]
+ for i in range(4):
+ p1 = approx[i][0]
+ p2 = approx[(i + 1) % 4][0]
+ p3 = approx[(i + 2) % 4][0]
- v1 = p1 - p2
- v2 = p3 - p2
- cosine = abs(np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2)))
- max_cosine = max(max_cosine, cosine)
+ v1 = p1 - p2
+ v2 = p3 - p2
+ # Add a small epsilon to avoid division by zero for zero-length vectors
+ norm_v1 = np.linalg.norm(v1)
+ norm_v2 = np.linalg.norm(v2)
+ if norm_v1 == 0 or norm_v2 == 0:
+ continue # or handle as an invalid angle
+ cosine = abs(np.dot(v1, v2) / (norm_v1 * norm_v2))
+ max_cosine = max(max_cosine, cosine)
return max_cosine < self.max_cosine_threshold
Suggestion importance[1-10]: 9
__
Why: The suggestion identifies a critical bug where non-quadrilateral contours are incorrectly validated as True, which would lead to incorrect page detection. The proposed fix is correct and essential for the function's logic.
| High
|
Use absolute path for config file
To prevent FileNotFoundError, modify load_config to use an absolute path for the default configuration file, constructed relative to the script's own location.
config/config_loader.py [4-13]
-def load_config(config_path="config/default_config.yaml"):
+def load_config(config_path=None):
"""
Load YAML config file and return it as a dictionary.
Raises an error if the config file is missing.
"""
+ if config_path is None:
+ # Construct path relative to this file's location
+ base_dir = os.path.dirname(os.path.abspath(__file__))
+ config_path = os.path.join(base_dir, "default_config.yaml")
+
if not os.path.exists(config_path):
raise FileNotFoundError(f"Config file not found at {config_path}")
with open(config_path, "r") as file:
return yaml.safe_load(file)
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a potential FileNotFoundError due to the use of a relative path and proposes a robust fix by creating an absolute path, significantly improving the function's reliability.
| Medium
|
| General |
Handle mismatched marker image sizes
Enhance calculate_marker_similarity to handle markers of different sizes by resizing one image to match the other before comparison, and simplify the score calculation using np.mean().
src/processors/CropOnMarkers.py [63-74]
def calculate_marker_similarity(self, marker1, marker2):
""" Calculate similarity between markers (dummy implementation to be improved) """
# Convert to grayscale for comparison
m1 = cv2.cvtColor(marker1, cv2.COLOR_BGR2GRAY) if len(marker1.shape) == 3 else marker1
m2 = cv2.cvtColor(marker2, cv2.COLOR_BGR2GRAY) if len(marker2.shape) == 3 else marker2
if m1.shape != m2.shape:
- return 0
+ # Resize m2 to match m1's shape for comparison
+ m2 = cv2.resize(m2, (m1.shape[1], m1.shape[0]), interpolation=cv2.INTER_AREA)
diff = cv2.absdiff(m1, m2)
- score = 1 - (np.sum(diff) / (m1.size * 255))
+ # The score is 1 minus the average difference per pixel
+ score = 1 - (np.mean(diff) / 255.0)
return max(0, min(1, score)) # Clamp between 0 and 1
Suggestion importance[1-10]: 6
__
Why: The suggestion improves robustness by handling markers of different sizes through resizing, which is a reasonable enhancement. The simplification using np.mean() is a minor but good code quality improvement.
| Low
|
|
| |