KMP Algorithm
Knuth-Morris-Pratt string matching
The problem:
given a (short) pattern and a (long) text, both strings, determine whether the pattern appears somewhere in the text.
Define function
void patternSearch(String text, String pattern)
KMP algorithm preprocess the pattern, when we are search for the pattern in the text, the pointer of text doesn't look back, we alternate the pointer of pattern to the position of longest prefix/suffix and continue the search.
suppose we have text: "abcdabcdcbd" and pattern "abcdabcf
when we mismatch 'f' and 'd' at:
ptr1 ptr1
| |
abcdabc d cbd -> abcdabc d cbd
abcdabc f abc d abcf
| |
ptr2 new ptr2
We move ptr2 to 'd' since the longest prefix/suffix in substring "abcdabc" is "abc"
we process in this way for all the mismatch in pattern.
The time complexity of search for one pattern is O(m+n)
(m is text's length, n is pattern's length
)
Implementation
import java.util.Arrays;
public class kmp {
public void patternSearch(String s, String pattern){
System.out.println("Search String:"+s+"\t"+"Pattern:"+pattern);
if(s==null || pattern==null || s.length()<pattern.length()) {
System.out.println("No match found!");
return;
}
int sLen=s.length(), pLen=pattern.length();
int[] next=getNext(pattern);
System.out.println(Arrays.toString(next));
int i=0, j=0;
while(i<sLen){
if(j==-1 || s.charAt(i)==pattern.charAt(j)){
i++;
j++;
}else{
j=next[j];
}
if(j==pLen){
System.out.println("Find at index "+(i-pLen));
j=0;
}
}
}
public int[] getNext(String Pattern){
int pLen=Pattern.length();
int[] next=new int[pLen];
//next[i] represents the longest proper prefix or suffix of pattern.substring(0,i)
next[0]=-1;
int i=0, j=-1;
while(i<pLen-1){
if(j==-1 || Pattern.charAt(i)==Pattern.charAt(j)) next[++i]=++j;
else j=next[j];
}
return next;
}
public static void main(String[] args){
kmp test=new kmp();
test.patternSearch("ABABDABACDABABCABAB", "ABCABC");
}
}