Python icon indicating copy to clipboard operation
Python copied to clipboard

Add Rabin-Karp String Matching Algorithm

Open vivekkumarrathour opened this issue 1 month ago • 0 comments

Feature description

Feature Description

The repository has several string matching algorithms (KMP, Boyer-Moore, Naive Search) but is missing the Rabin-Karp algorithm, which is an important string matching algorithm that uses hashing for pattern searching. This is particularly useful for multiple pattern searches and plagiarism detection.

Proposed Implementation

I propose adding a Rabin-Karp string matching algorithm in the strings/ directory with the following features:

  1. Single pattern matching using rolling hash technique
  2. Multiple pattern matching - search for multiple patterns in a single pass
  3. Configurable hash parameters (base and modulus)
  4. Collision handling - verify matches to avoid false positives
  5. Time complexity: O(n + m) average case, O(nm) worst case
  6. Comprehensive doctests with edge cases (empty strings, no matches, multiple matches)
  7. Type hints and clear documentation

Why This Enhancement?

  • Educational Value: Rabin-Karp introduces the concept of rolling hash, an important technique in computer science
  • Practical Applications:
    • Plagiarism detection (comparing documents)
    • DNA sequence matching in bioinformatics
    • Substring search in large texts
    • Finding duplicate files
  • Complements existing algorithms: Provides comparison with KMP and Boyer-Moore
  • Multiple pattern searching: Unlike other algorithms, Rabin-Karp can efficiently search for multiple patterns simultaneously

Key Features to Highlight

  • Rolling hash technique: Demonstrates efficient hash recalculation
  • Spurious hit handling: Shows how to deal with hash collisions
  • Modular design: Separate functions for single and multiple pattern matching
  • Performance comparison: Can be compared with naive, KMP, and Boyer-Moore approaches

Example Applications

# Single pattern search
text = "hello world hello"
pattern = "hello"
indices = rabin_karp_search(text, pattern)  # Returns [0, 12]

# Multiple pattern search
patterns = ["hello", "world"]
matches = rabin_karp_multiple(text, patterns)
# Returns {"hello": [0, 12], "world": [6]}

Benefits

  • Teaches rolling hash concept
  • Useful for real-world string matching problems
  • Shows trade-offs between different string matching algorithms
  • Demonstrates practical applications of hashing

I'm willing to implement this following the repository's guidelines if approved.

vivekkumarrathour avatar Nov 17 '25 16:11 vivekkumarrathour