Add Rabin-Karp String Matching Algorithm

Open vivekkumarrathour opened this issue 1 month ago • 0 comments

Feature description

Feature Description

The repository has several string matching algorithms (KMP, Boyer-Moore, Naive Search) but is missing the Rabin-Karp algorithm, which is an important string matching algorithm that uses hashing for pattern searching. This is particularly useful for multiple pattern searches and plagiarism detection.

Proposed Implementation

I propose adding a Rabin-Karp string matching algorithm in the strings/ directory with the following features:

Single pattern matching using rolling hash technique
Multiple pattern matching - search for multiple patterns in a single pass
Configurable hash parameters (base and modulus)
Collision handling - verify matches to avoid false positives
Time complexity: O(n + m) average case, O(nm) worst case
Comprehensive doctests with edge cases (empty strings, no matches, multiple matches)
Type hints and clear documentation

Why This Enhancement?

Educational Value: Rabin-Karp introduces the concept of rolling hash, an important technique in computer science
Practical Applications:
- Plagiarism detection (comparing documents)
- DNA sequence matching in bioinformatics
- Substring search in large texts
- Finding duplicate files
Complements existing algorithms: Provides comparison with KMP and Boyer-Moore
Multiple pattern searching: Unlike other algorithms, Rabin-Karp can efficiently search for multiple patterns simultaneously

Key Features to Highlight

Rolling hash technique: Demonstrates efficient hash recalculation
Spurious hit handling: Shows how to deal with hash collisions
Modular design: Separate functions for single and multiple pattern matching
Performance comparison: Can be compared with naive, KMP, and Boyer-Moore approaches

Example Applications

# Single pattern search
text = "hello world hello"
pattern = "hello"
indices = rabin_karp_search(text, pattern)  # Returns [0, 12]

# Multiple pattern search
patterns = ["hello", "world"]
matches = rabin_karp_multiple(text, patterns)
# Returns {"hello": [0, 12], "world": [6]}

Benefits

Teaches rolling hash concept
Useful for real-world string matching problems
Shows trade-offs between different string matching algorithms
Demonstrates practical applications of hashing

I'm willing to implement this following the repository's guidelines if approved.

Nov 17 '25 16:11 vivekkumarrathour