Rabin-Karp

Concept

Rabin-Karp uses hashing for string matching. Compute a hash of the pattern and a rolling hash of each window of the text. If hashes match, verify character by character to avoid false positives. The rolling hash updates in O(1) by removing the leftmost character and adding the new rightmost character.

When to Use

  • Finding pattern occurrences with hash-based matching
  • Multiple pattern search (compute hash for each pattern)
  • Plagiarism detection or document similarity
  • When you need a simple hash-based approach and KMP feels overkill

Intuition

Instead of comparing characters one by one, compute a 'fingerprint' (hash) of the pattern. Slide a window across the text and compute the hash of each window. If the fingerprints match, the strings might match — verify to be sure. The rolling hash trick: when sliding the window right by one character, subtract the contribution of the old leftmost character and add the new rightmost character. This makes each window's hash O(1) to compute after the first one.

Complexity

Time
O(n + m) average, O(n*m) worst case
Space
O(1) extra

Template Code

cpp
#include<bits/stdc++.h>
using namespace std;

using ll = long long int;
const ll mod = 1e9+7;
const ll base = 31;

int main(){
    ios_base::sync_with_stdio(0), cin.tie(0);

    string text, pat;
    cin >> text >> pat;

    int n = text.size(), m = pat.size();
    if(m > n){
        cout << "0 occurrences\n";
        return 0;
    }

    // Compute base^m % mod
    ll pw = 1;
    for(int i = 0; i < m; i++){
        pw = pw * base % mod;
    }

    // Hash of pattern
    ll hashPat = 0;
    for(int i = 0; i < m; i++){
        hashPat = (hashPat * base + pat[i]) % mod;
    }

    // Hash of first window
    ll hashWin = 0;
    for(int i = 0; i < m; i++){
        hashWin = (hashWin * base + text[i]) % mod;
    }

    int cnt = 0;
    for(int i = 0; i <= n - m; i++){
        if(hashWin == hashPat){
            // Verify to avoid false positive
            bool match = true;
            for(int j = 0; j < m; j++){
                if(text[i+j] != pat[j]){
                    match = false;
                    break;
                }
            }
            if(match){
                cout << i << "\n";
                cnt++;
            }
        }

        // Roll the hash forward
        if(i + m < n){
            hashWin = (hashWin * base - text[i] * pw % mod + text[i+m] + mod * 2) % mod;
        }
    }

    cout << cnt << " occurrences\n";

    return 0;
}

// Sample Input:
// abracadabra
// abra
//
// Sample Output:
// 0
// 7
// 2 occurrences