字符串匹配算法之Sunday算法

添加时间：2016-12-20 17:06:27 文章作者：济南软件开发浏览次数：

字符串匹配查找算法中，最著名的两个是KMP算法（Knuth-Morris-Pratt)和BM算法（Boyer-Moore)。两个算法在最坏情况下均具有线性的查找时间。但是在实用上，KMP算法并不比最简单的C库函数strstr()快多少，而BM算法则往往比KMP算法快上3－5倍(未亲身实践)。但是BM算法还不是最快的算法，这里介绍一种比BM算法更快一些的查找算法Sunday算法。

Sunday算法的思想和BM算法中的坏字符思想非常类似。差别只是在于Sunday算法在匹配失败之后，是取目标串中当前和Pattern字符串对应的部分后面一个位置的字符来做坏字符匹配。

当发现匹配失败的时候就判断母串中当前偏移量+Pattern字符串长度+1处(假设为K位置)的字符在Pattern字符串中是否存在。如果存在，则将该位置和Pattern字符串中的该字符对齐，再从头开始匹配；如果不存在，就将Pattern字符串向后移动，和母串k+1处的字符对齐，再进行匹配。重复上面的操作直到找到，或母串被找完结束。

动手写了个小例子来实现以下这个算法。

在代码中，实现了两种字符串匹配算法，一种是Sunday方式，一种是普通的每次移动一位的方式，二者的效率对比在main函数中有，都是纳秒级别。算法的详细步骤，在代码中已经添加了相应的注释。关于BM算法，下次空了再一起对照着分析。

1 import java.util.HashMap;

2 import java.util.LinkedList;

3 import java.util.List;

4 import java.util.Map;

6 /**

7 * @author Scott

8 * @date 2013年12月28日

9 * @description

10 */

11 public class SundySearch {

12 String text = null;

13 String pattern = null;

14 int currentPos = 0;

16 /**

17 * 匹配后的子串第一个字符位置列表

18 */

19 List matchedPosList = new LinkedList();

21 /**

22 * 匹配字符的Map,记录改匹配字符串有哪些char并且每个char最后出现的位移

23 */

24 Map map = new HashMap();

26 public SundySearch(String text, String pattern) {

27 this.text = text;

28 this.pattern = pattern;

29 this.initMap();

30 };

32 /**

33 * Sunday匹配时，用来存储Pattern中每个字符最后一次出现的位置，从左到右的顺序

34 */

35 private void initMap() {

36 for (int i = 0; i < pattern.length(); i++) {

37 this.map.put(pattern.charAt(i), i);

39 }

40 }

42 /**

43 * 普通的字符串递归匹配，匹配失败就前进一位

44 */

45 public List normalMatch() {

46 //匹配失败，继续往下走

47 if (!matchFromSpecialPos(currentPos)) {

48 currentPos += 1;

50 if ((text.length() - currentPos) < pattern.length()) {

51 return matchedPosList;

52 }

53 normalMatch();

54 } else {

55 //匹配成功，记录位置

56 matchedPosList.add(currentPos);

57 currentPos += 1;

58 normalMatch();

59 }

61 return matchedPosList;

62 }

64 /**

65 * Sunday匹配，假定Text中的K字符的位置为：当前偏移量+Pattern字符串长度+1

66 */

67 public List sundayMatch() {

68 // 如果没有匹配成功

69 if (!matchFromSpecialPos(currentPos)) {

70 // 如果Text中K字符没有在Pattern字符串中出现，则跳过整个Pattern字符串长度

71 if ((currentPos + pattern.length() + 1) < text.length()

72 && !map.containsKey(text.charAt(currentPos + pattern.length() + 1))) {

73 currentPos += pattern.length();

74 }else {

75 // 如果Text中K字符在Pattern字符串中出现，则将Text中K字符的位置和Pattern字符串中的最后一次出现K字符的位置对齐

76 if ((currentPos + pattern.length() + 1) > text.length()) {

77 currentPos += 1;

78 } else {

79 currentPos += pattern.length() - (Integer) map.get(text.charAt(currentPos + pattern.length()));

80 }

81 }

83 // 匹配完成，返回全部匹配成功的初始位移

84 if ((text.length() - currentPos) < pattern.length()) {

85 return matchedPosList;

86 }

88 sundayMatch();

89 }else {

90 // 匹配成功前进一位然后再次匹配

91 matchedPosList.add(currentPos);

92 currentPos += 1;

93 sundayMatch();

94 }

95 return matchedPosList;

96 }

98 /**

99 * 检查从Text的指定偏移量开始的子串是否和Pattern匹配

100 */

101 public boolean matchFromSpecialPos(int pos) {

102 if ((text.length()-pos) < pattern.length()) {

103 return false;

104 }

105

106 for (int i = 0; i < pattern.length(); i++) {

107 if (text.charAt(pos + i) == pattern.charAt(i)) {

108 if (i == (pattern.length()-1)) {

109 return true;

110 }

111 continue;

112 } else {

113 break;

114 }

115 }

116

117 return false;

118 }

119

120 public static void main(String[] args) {

121 SundySearch sundySearch = new SundySearch("hello 啊啊阿道夫 adfsadfklf adf234masdfsdfdsfdsfdsffwerwrewrerwerwersdf2666sdflsdfk", "adf");

122

123 long begin = System.nanoTime();

124 System.out.println("NormalMatch:" + sundySearch.normalMatch());

125 System.out.println("NormalMatch:" + (System.nanoTime() - begin));

126

127 begin = System.nanoTime();

128 System.out.println("SundayMatch:" + sundySearch.sundayMatch());

129 System.out.println("SundayMatch:" + (System.nanoTime() - begin));

130

131 }

132 }

运行结果：

NormalMatch:[13, 17, 24]

NormalMatch:313423

SundayMatch:[13, 17, 24]

SundayMatch:36251

使用Sunday算法要比普通的匹配算法快了10倍左右，虽然是纳秒级别~因为我们匹配的内容就那么短，内容越长，效果就会越明显。

济南软件开发山东软件开发

字符串匹配算法之Sunday算法