normcap icon indicating copy to clipboard operation
normcap copied to clipboard

Barcode detection causes false positives with plain numeric text

Open shukebeta opened this issue 2 months ago • 0 comments

Description

The barcode/QR code detection feature (enabled by default with detect-codes=true) can produce false positives when processing images containing plain numeric text. When a false positive occurs, the OCR path is completely skipped, resulting in incorrect text extraction.

Steps to Reproduce

  1. Ensure detect-codes=true in settings (default)
  2. Capture a screenshot containing numeric text that resembles a barcode pattern (e.g., "91385057399027")
  3. NormCap detects it as a barcode and returns only the numbers
  4. Expected OCR processing with proper text extraction is skipped

Example Log

WARNING - === CODES detection enabled, calling detect_codes() ===
WARNING - === Found 1 raw results, codes=['91385057399027'] ===
WARNING - === Single code detected: '91385057399027' (type=TextType.SINGLE_LINE) ===
WARNING - === detect_codes() returned: DetectionResult(...detector=<TextDetector.BARCODE>) ===

Current Behavior

When detect_codes=true (default), the detection logic prioritizes barcode/QR code detection:

  • If a code is detected (even falsely), OCR is skipped entirely
  • Users get incomplete/incorrect text extraction without realizing why

Expected Behavior

Potential improvements:

  1. Reduce false positive rate: Configure zxingcpp with stricter thresholds
  2. Confidence scoring: Only skip OCR if barcode confidence is high
  3. Dual detection: Run both code detection and OCR, intelligently choose the better result
  4. User control: Make it easier to disable code detection when not needed

Environment

  • NormCap version: latest (main branch)
  • Platform: Linux
  • Settings: default (detect-codes=true)

Impact

This affects users processing:

  • Mixed text with numbers (especially Asian text with digits)
  • Documents with number sequences
  • Any content where numeric patterns might trigger false barcode detection

Workaround

Manually disable code detection:

# In settings GUI: uncheck "Detect codes"
# Or edit config:
sed -i 's/detect-codes=true/detect-codes=false/' ~/.config/normcap/settings.conf

Related

This issue was discovered while testing PR #801 (smart whitespace stripping for CJK text).

shukebeta avatar Oct 13 '25 10:10 shukebeta