normcap
normcap copied to clipboard
Barcode detection causes false positives with plain numeric text
Description
The barcode/QR code detection feature (enabled by default with detect-codes=true) can produce false positives when processing images containing plain numeric text. When a false positive occurs, the OCR path is completely skipped, resulting in incorrect text extraction.
Steps to Reproduce
- Ensure
detect-codes=truein settings (default) - Capture a screenshot containing numeric text that resembles a barcode pattern (e.g., "91385057399027")
- NormCap detects it as a barcode and returns only the numbers
- Expected OCR processing with proper text extraction is skipped
Example Log
WARNING - === CODES detection enabled, calling detect_codes() ===
WARNING - === Found 1 raw results, codes=['91385057399027'] ===
WARNING - === Single code detected: '91385057399027' (type=TextType.SINGLE_LINE) ===
WARNING - === detect_codes() returned: DetectionResult(...detector=<TextDetector.BARCODE>) ===
Current Behavior
When detect_codes=true (default), the detection logic prioritizes barcode/QR code detection:
- If a code is detected (even falsely), OCR is skipped entirely
- Users get incomplete/incorrect text extraction without realizing why
Expected Behavior
Potential improvements:
- Reduce false positive rate: Configure zxingcpp with stricter thresholds
- Confidence scoring: Only skip OCR if barcode confidence is high
- Dual detection: Run both code detection and OCR, intelligently choose the better result
- User control: Make it easier to disable code detection when not needed
Environment
- NormCap version: latest (main branch)
- Platform: Linux
- Settings: default (
detect-codes=true)
Impact
This affects users processing:
- Mixed text with numbers (especially Asian text with digits)
- Documents with number sequences
- Any content where numeric patterns might trigger false barcode detection
Workaround
Manually disable code detection:
# In settings GUI: uncheck "Detect codes"
# Or edit config:
sed -i 's/detect-codes=true/detect-codes=false/' ~/.config/normcap/settings.conf
Related
This issue was discovered while testing PR #801 (smart whitespace stripping for CJK text).