Python
Python copied to clipboard
Add Rabin-Karp String Matching Algorithm
Feature description
Feature Description
The repository has several string matching algorithms (KMP, Boyer-Moore, Naive Search) but is missing the Rabin-Karp algorithm, which is an important string matching algorithm that uses hashing for pattern searching. This is particularly useful for multiple pattern searches and plagiarism detection.
Proposed Implementation
I propose adding a Rabin-Karp string matching algorithm in the strings/ directory with the following features:
- Single pattern matching using rolling hash technique
- Multiple pattern matching - search for multiple patterns in a single pass
- Configurable hash parameters (base and modulus)
- Collision handling - verify matches to avoid false positives
- Time complexity: O(n + m) average case, O(nm) worst case
- Comprehensive doctests with edge cases (empty strings, no matches, multiple matches)
- Type hints and clear documentation
Why This Enhancement?
- Educational Value: Rabin-Karp introduces the concept of rolling hash, an important technique in computer science
- Practical Applications:
- Plagiarism detection (comparing documents)
- DNA sequence matching in bioinformatics
- Substring search in large texts
- Finding duplicate files
- Complements existing algorithms: Provides comparison with KMP and Boyer-Moore
- Multiple pattern searching: Unlike other algorithms, Rabin-Karp can efficiently search for multiple patterns simultaneously
Key Features to Highlight
- Rolling hash technique: Demonstrates efficient hash recalculation
- Spurious hit handling: Shows how to deal with hash collisions
- Modular design: Separate functions for single and multiple pattern matching
- Performance comparison: Can be compared with naive, KMP, and Boyer-Moore approaches
Example Applications
# Single pattern search
text = "hello world hello"
pattern = "hello"
indices = rabin_karp_search(text, pattern) # Returns [0, 12]
# Multiple pattern search
patterns = ["hello", "world"]
matches = rabin_karp_multiple(text, patterns)
# Returns {"hello": [0, 12], "world": [6]}
Benefits
- Teaches rolling hash concept
- Useful for real-world string matching problems
- Shows trade-offs between different string matching algorithms
- Demonstrates practical applications of hashing
I'm willing to implement this following the repository's guidelines if approved.