KMP Pattern Matching
Concept
KMP (Knuth-Morris-Pratt) finds all occurrences of a pattern in a text in O(n + m) time. It precomputes a failure function (also called prefix function or LPS array) that tells us the longest proper prefix of the pattern that is also a suffix. This allows skipping characters during mismatches instead of restarting from scratch.
When to Use
- ▸Finding all occurrences of a pattern in a text
- ▸Counting how many times a pattern appears in a string
- ▸Problems involving repeated patterns or periodic strings
- ▸When brute force O(n*m) string matching is too slow
Intuition
In naive matching, when a mismatch occurs, you restart from the beginning of the pattern. KMP is smarter — it asks 'what prefix of the pattern have I already matched that is also a suffix of what I've seen?' and jumps there instead. The failure function precomputes these jumps. For example, if the pattern is 'ABCABD' and you matched 'ABCAB' before failing, the failure function tells you 'AB' at the start matches 'AB' at the end, so jump to position 2 instead of 0.
Complexity
Template Code
#include<bits/stdc++.h>
using namespace std;
vector<int> buildLPS(string &pat){
int m = pat.size();
vector<int> lps(m, 0);
int len = 0;
int i = 1;
while(i < m){
if(pat[i] == pat[len]){
len++;
lps[i] = len;
i++;
}
else{
if(len != 0){
len = lps[len-1];
}
else{
lps[i] = 0;
i++;
}
}
}
return lps;
}
int main(){
ios_base::sync_with_stdio(0), cin.tie(0);
string text, pat;
cin >> text >> pat;
int n = text.size(), m = pat.size();
vector<int> lps = buildLPS(pat);
int cnt = 0;
int i = 0, j = 0;
while(i < n){
if(text[i] == pat[j]){
i++;
j++;
}
if(j == m){
cout << i - j << "\n";
cnt++;
j = lps[j-1];
}
else if(i < n && text[i] != pat[j]){
if(j != 0){
j = lps[j-1];
}
else{
i++;
}
}
}
cout << cnt << " occurrences\n";
return 0;
}
// Sample Input:
// ababababca
// abab
//
// Sample Output:
// 0
// 2
// 4
// 3 occurrences