The Rabin Karp algorithm is a popular technique used to find a pattern within a string or a sequence of characters. Developed by Michael O. Rabin and Richard M. Karp in 1987, this algorithm is particularly efficient for multiple-pattern searches and serves as an excellent demonstration of how hashing can simplify complex problems.
At its core, Rabin Karp employs hashing to perform substring searches. Instead of comparing the pattern to every substring of the text individually, the algorithm computes a hash value for the pattern and a hash value for each substring of the text of the same length. When hash values match, the algorithm verifies the actual substring and pattern for a potential match. This method significantly reduces the number of character comparisons, especially when dealing with longer texts and multiple patterns.
Let’s break down the Rabin Karp algorithm into digestible steps using a simple example.
Suppose we have a text: "ababcababcabab"
and we want to find the pattern: "abc"
.
We need to choose a hashing function. A simple approach could be to use a rolling hash function that computes a hash for each substring.
Calculate the hash of the pattern. For our example, if we denote the ASCII values of characters, the hash for "abc" can be calculated as follows:
hash("abc") = (a * d^0 + b * d^1 + c * d^2) % q
where d is a constant (e.g., 256 for ASCII), and q is a large prime number (to reduce collisions).
Using ASCII values for 'a', 'b', and 'c', we could compute:
hash("abc") = (97 * 256^2 + 98 * 256^1 + 99 * 256^0) % 101
hash_value_pattern = computed_hash_value
Calculate the hash for the first substring (length equal to the pattern length) of the text "abc"
(which equals text[0:3]
):
hash("abc") = hash_value_substring_0
If the hash of the substring matches the hash of the pattern, check for an actual match (to handle possible collisions). This involves a character-by-character comparison.
Slide the window by one character: Calculate the hash for the next substring using the rolling hash technique:
hash(new_substring) = (d * (hash(previous_substring) - text[left] * d^(m-1)) + text[right]) % q
Repeat the process until you've gone through the entire text.
Below is a Python implementation of the Rabin Karp algorithm:
def rabin_karp(text, pattern): d = 256 # Number of characters in the input alphabet q = 101 # A prime number m = len(pattern) n = len(text) p = 0 # hash value for pattern t = 0 # hash value for text h = 1 # The value of h would be "pow(d, m-1)%q" for i in range(m - 1): h = (h * d) % q # Calculate the hash value of pattern and first window of text for i in range(m): p = (d * p + ord(pattern[i])) % q t = (d * t + ord(text[i])) % q # Slide the pattern over text for i in range(n - m + 1): # Check the hash values of the current window of text and pattern if p == t: # Check for characters one by one if text[i:i + m] == pattern: print(f"Pattern found at index {i}") # Calculate hash for the next window if i < n - m: t = (d * (t - ord(text[i]) * h) + ord(text[i + m])) % q # We might get negative value of t, converting it to positive if t < 0: t = t + q
The average and best-case time complexity of the Rabin Karp algorithm is O(n + m), where n is the length of the text and m is the length of the pattern. However, in the worst-case scenario, the time complexity could degrade to O(n * m) due to hash collisions requiring character comparisons.
The Rabin Karp algorithm is particularly useful in applications where multiple pattern searches are needed, such as:
The Rabin Karp algorithm offers a fascinating perspective on pattern matching through hashing. Its simplicity and efficiency in handling multiple patterns make it a valuable tool in various applications. By leveraging the power of mathematical hashing, Rabin Karp exemplifies how computational techniques can optimize string processing tasks, making it an essential topic in advanced string-based interview techniques.
13/10/2024 | DSA
06/12/2024 | DSA
23/09/2024 | DSA
16/11/2024 | DSA
08/12/2024 | DSA
06/12/2024 | DSA
16/11/2024 | DSA
08/12/2024 | DSA
16/11/2024 | DSA
06/12/2024 | DSA
16/11/2024 | DSA