KMP算法,及其例题

KMP算法

是一种改进的字符串匹配算法,由D.E.Knuth,J.H.Morris和V.R.Pratt同时发现,因此人们称它为克努特——莫里斯——普拉特操作(简称KMP算法)。
目的:KMP算法的关键是利用匹配失败后的信息,尽量减少模式串与主串的匹配次数以达到快速匹配的目的。具体实现就是实现一个next()函数,函数本身包含了模式串的局部匹配信息。

时间复杂度O(m+n)

第一步,找匹配串中的重复情况,用于确定匹配失败后匹配串回溯位置。
比如在这里插入图片描述
这些线都表示每一位匹配失败后回溯到的位置。并存入Next[ ]数组。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int Next[10000000];
void getnext()
{
int j = 0, k = -1;
Next[0] = -1;
int len = p.size();
while (j < len)
{
if ((k == -1) || (p[k] == p[j])) {
k++;
j++;
Next[j] = k;
}else {
k = Next[k];
}
}
}

然后就是搜索匹配了,主串和匹配串指针一起走,当匹配失败后,主串指针不动,匹配串指针回溯到Next[ ]中记录的位置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int cont=0;
void KMP(string t,string p)//t为主串,p为匹配串
{
int i = 0, j = 0;
int len1 = t.size(), len2 = p.size();

while (i < len1)//主串全部匹配完
{
if ((j == -1) || (t[i] == p[j])) {//上一次匹配失败或当前位置匹配成功
i++;//主串指针右移
j++;//匹配串指针右移
}else {
j = Next[j];//匹配失败,匹配串指针回溯
}
if (j == len2) {
cont++;//匹配串完全匹配成功,匹配次数++;
}
}
}

例题1:

Oulipo

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter ‘e’. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive ‘T’s is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {‘A’, ‘B’, ‘C’, …, ‘Z’} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {‘A’, ‘B’, ‘C’, …, ‘Z’}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {‘A’, ‘B’, ‘C’, …, ‘Z’}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input
1
2
3
4
5
6
7
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
2
3
1
3
0

题目很长,其实就是KMP的模板题,让你找a串在b串中出现了几次。

AC代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <bits/stdc++.h>
using namespace std;
string t, p;
int cont = 0;
int Next[10000000];
void getnext()
{
int j = 0, k = -1;

Next[0] = -1;
int len = p.size();
while (j < len)
{
if ((k == -1) || (p[k] == p[j])) {
k++;
j++;
Next[j] = k;
}else {
k = Next[k];
}
}
}
void KMP()
{
int i = 0, j = 0;
int len1 = t.size(), len2 = p.size();

while (i < len1)
{
if ((j == -1) || (t[i] == p[j])) {
i++;
j++;
}else {
j = Next[j];
}
if (j == len2) {
cont++;
}
}
}
int main()
{
cin>>t>>p;
getnext();
KMP();
if (cont != 0) {
cout<<"yes"<<endl;
cout<<cont<<endl;
}else {
cout<<"NO"<<endl;
}
return (0);
}

例题2:

Power Strings

Given two strings a and b we define ab to be their concatenation. For example, if a = “abc” and b = “def” then ab = “abcdef”. If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = “” (the empty string) and a^(n+1) = a*(a^n).
Input
Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.

Output

For each s you should print the largest n such that s = a^n for some string a.

Sample Input
1
2
3
4
abcd
aaaa
ababab
.
Sample Output
1
2
3
1
4
3
Hint

This problem has huge input, use scanf instead of cin to avoid time limit exceed.
题目大意: 寻找字符串中循环节的数量。
考察点: Next[ ]数组的原理理解和应用
思路: 我们可以从KMP的Next[ ]数组的原理上很容易得到,其实最后一位向前回溯的长度就是循环节的长度,所以只需要用len/(len-Next[len-1])

AC代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include<iostream>
#include <string.h>
#include <stdlib.h>
using namespace std;
int Next[1000010];
int getnext(string p)
{
int j = 0, k = -1;
Next[0] = -1;
int len = p.size();
int cont=0;
while (j < len)
{
if ((k == -1) || (p[k] == p[j])) {
k++;
j++;
Next[j] = k;
}else {
k = Next[k];
}
}
if(len%(j-k)!=0)
return 1;
else
return (len/(j-k));
}
int main()
{
string s;
while(cin>>s&&s!=".")
{
memset(Next,0,sizeof(Next));
cout<<getnext(s)<<endl;
}
}

例题3:

Period

For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A K ,that is A concatenated K times, for some string A. Of course, we also want to know the period K.

Input

The input consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1 000 000) – the size of the string S.The second line contains the string S. The input file ends with a line, having the
number zero on it.

Output

For each test case, output “Test case #” and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case.

Sample Input
1
2
3
4
5
3
aaa
12
aabaabaabaab
0
Sample Output
1
2
3
4
5
6
7
8
9
Test case #1
2 2
3 3

Test case #2
2 2
6 2
9 3
12 4

题目大意:一个字符串,如果一个循环节出现两次或以上次数,则输出当前位置和循环节出现次数,比如样例中的aabaabaabaab,在第二位,循环节是a,出现了两次所以输出2 2,然后到第六位的时候,循环节aab出现了两次,所以是6 2,9 3, 12 4 同理。
AC代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include<iostream>
#include<algorithm>
#include<string>
#include<string.h>
using namespace std;
const int maxn=1000000+5;
int Next[maxn];
int len;
void getnext(string a)
{
int j=0,k=-1;
Next[0]=-1;
while(j<len)
{
if(k==-1||a[j]==a[k])
{
j++;
k++;
Next[j]=k;
}
else
{
k=Next[k];
}
}
}
int main()
{
string s;
int num=1;
while(1)
{
cin>>len;
if(len==0)
break;
memset(Next,0,sizeof(Next));
cin>>s;
getnext(s);
cout<<"Test case #"<<num<<"\n";
for(int i=2;i<=len;i++)
{
if(i%(i-Next[i])==0&&i/(i-Next[i])>=2)//这个题的精华所在,
cout<<i<<" "<<i/(i-Next[i])<<endl;
}
cout<<"\n";
num++;
}
}

坚持原创技术分享,您的支持也将成为我的动力!
-------------本文结束感谢您的阅读-------------
undefined