String Searching in O(N)

許友綸
1 min readJan 20, 2021

For string searching, it is easy to come up with a naive solution for string searching

However the naive solution takes O(MN), where N and M denote the length of str and pattern respectively. And often, we meet the worst case.

str=aaaaaaaaaaaaaaaa; pattern=aaaaaaaab

for each position i in str, we need to compare the pattern ‘til the end.

The intuition is that we have known that the prefix a* is already compared and matched. We only need to make sure if the next one is b. Take a look at more example

str=abcd[e]; pattern=abcd[f]
// strstr(str, pattern) == [];

after comparing e and f, is it really necessary to restart the comparison at position 1? NO, we can forward the starting position to the position of e. In the sense that we need to take care of the repeated structure of the pattern string.

Knuth–Morris–Pratt Algorithm

  1. Preprocess the pattern, calculate lps array
lps[i] := the longest length of the proper prefix of str[0…i] that matches the postfix of str[0…i].
“proper prefix” means all prefixes excluding the whole string.

For example lps(“AABAAB”) == [0, 1, 0, 1, 2, 3], lps(“AAA”) = [0, 1, 2]

Calculate lsp is trivial. We can use dynamic programming here. I’ve seen so many O(M²) code.

2. KMP match

initialize i=0 and j=0, calculate lps(pattern)while i<len(str)
if str[i] == pattern[j] // MATCH
j++
else if j > 0 // MISMATCH and j>0
j = lps[j-1]
continue
if j >= len(pattern) // A COMPLETE MATCH
yield i-j+1
i++

--

--