Rabin-Karp
Concept
Rabin-Karp uses hashing for string matching. Compute a hash of the pattern and a rolling hash of each window of the text. If hashes match, verify character by character to avoid false positives. The rolling hash updates in O(1) by removing the leftmost character and adding the new rightmost character.
When to Use
- ▸Finding pattern occurrences with hash-based matching
- ▸Multiple pattern search (compute hash for each pattern)
- ▸Plagiarism detection or document similarity
- ▸When you need a simple hash-based approach and KMP feels overkill
Intuition
Instead of comparing characters one by one, compute a 'fingerprint' (hash) of the pattern. Slide a window across the text and compute the hash of each window. If the fingerprints match, the strings might match — verify to be sure. The rolling hash trick: when sliding the window right by one character, subtract the contribution of the old leftmost character and add the new rightmost character. This makes each window's hash O(1) to compute after the first one.
Complexity
Template Code
#include<bits/stdc++.h>
using namespace std;
using ll = long long int;
const ll mod = 1e9+7;
const ll base = 31;
int main(){
ios_base::sync_with_stdio(0), cin.tie(0);
string text, pat;
cin >> text >> pat;
int n = text.size(), m = pat.size();
if(m > n){
cout << "0 occurrences\n";
return 0;
}
// Compute base^m % mod
ll pw = 1;
for(int i = 0; i < m; i++){
pw = pw * base % mod;
}
// Hash of pattern
ll hashPat = 0;
for(int i = 0; i < m; i++){
hashPat = (hashPat * base + pat[i]) % mod;
}
// Hash of first window
ll hashWin = 0;
for(int i = 0; i < m; i++){
hashWin = (hashWin * base + text[i]) % mod;
}
int cnt = 0;
for(int i = 0; i <= n - m; i++){
if(hashWin == hashPat){
// Verify to avoid false positive
bool match = true;
for(int j = 0; j < m; j++){
if(text[i+j] != pat[j]){
match = false;
break;
}
}
if(match){
cout << i << "\n";
cnt++;
}
}
// Roll the hash forward
if(i + m < n){
hashWin = (hashWin * base - text[i] * pw % mod + text[i+m] + mod * 2) % mod;
}
}
cout << cnt << " occurrences\n";
return 0;
}
// Sample Input:
// abracadabra
// abra
//
// Sample Output:
// 0
// 7
// 2 occurrences