KMP Pattern Matching

Concept

KMP (Knuth-Morris-Pratt) finds all occurrences of a pattern in a text in O(n + m) time. It precomputes a failure function (also called prefix function or LPS array) that tells us the longest proper prefix of the pattern that is also a suffix. This allows skipping characters during mismatches instead of restarting from scratch.

When to Use

  • Finding all occurrences of a pattern in a text
  • Counting how many times a pattern appears in a string
  • Problems involving repeated patterns or periodic strings
  • When brute force O(n*m) string matching is too slow

Intuition

In naive matching, when a mismatch occurs, you restart from the beginning of the pattern. KMP is smarter — it asks 'what prefix of the pattern have I already matched that is also a suffix of what I've seen?' and jumps there instead. The failure function precomputes these jumps. For example, if the pattern is 'ABCABD' and you matched 'ABCAB' before failing, the failure function tells you 'AB' at the start matches 'AB' at the end, so jump to position 2 instead of 0.

Complexity

Time
O(n + m)
Space
O(m)

Template Code

cpp
#include<bits/stdc++.h>
using namespace std;

vector<int> buildLPS(string &pat){
    int m = pat.size();
    vector<int> lps(m, 0);
    int len = 0;
    int i = 1;

    while(i < m){
        if(pat[i] == pat[len]){
            len++;
            lps[i] = len;
            i++;
        }
        else{
            if(len != 0){
                len = lps[len-1];
            }
            else{
                lps[i] = 0;
                i++;
            }
        }
    }

    return lps;
}

int main(){
    ios_base::sync_with_stdio(0), cin.tie(0);

    string text, pat;
    cin >> text >> pat;

    int n = text.size(), m = pat.size();
    vector<int> lps = buildLPS(pat);

    int cnt = 0;
    int i = 0, j = 0;
    while(i < n){
        if(text[i] == pat[j]){
            i++;
            j++;
        }

        if(j == m){
            cout << i - j << "\n";
            cnt++;
            j = lps[j-1];
        }
        else if(i < n && text[i] != pat[j]){
            if(j != 0){
                j = lps[j-1];
            }
            else{
                i++;
            }
        }
    }

    cout << cnt << " occurrences\n";

    return 0;
}

// Sample Input:
// ababababca
// abab
//
// Sample Output:
// 0
// 2
// 4
// 3 occurrences