博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
快速生成后缀树的McCreight算法及其实现
阅读量:4355 次
发布时间:2019-06-07

本文共 23384 字,大约阅读时间需要 77 分钟。

快速生成后缀树的McCreight算法及其实现

作者: ljs
2011-07-03
(版权所有,转载请注明)
McCreight 算法(简称mcc算法)是基于蛮力法,即已知输入文本串T的内容(注:Ukkonen算法是online的,所以不要求事先知道T的全部内容),逐步缩短 插入到树中的后缀长度,直到将最后一个后缀(等于末尾那个字符)插入到前面已经生成的树中为止。它与蛮力法的区别是,T的最后一个字符必须与前面的n-1 个字符中的任何一个字符不同(n是T的长度),换句话说,T的最后一个字符不属于字母表(希腊字母大写SIGMA)中任何字符,这样生成的Suffix Tree的特点是,所有的后缀都终止于叶子结点,而且每个叶子结点必定对应一个后缀。也就是说,任何内部结点都不会是后缀的终止结点。这个要求是 McCreight算法和Ukkonen算法的假设前提。

mcc算法的核心思想是suffix link(后缀连接)和head/tail的概念。所谓结点X的suffix link指向的结点Y,指的是如果从根结点出发到X结点终止时的字符串等于xW(其中小写字母x表示单个字符,W表示一个字符串),那么从根结点出发到Y 结点终止时的字符串等于W。head[i]指的是后缀树T中,Suffix[i]与所有后缀共享的前缀中最长的前缀。

2011070317430522.gif

mcc 算法基本流程可以描述如下:如上图所示,这里树的左枝对应插入后缀Suffix[i-1]之后的效果,v和u都是内部结点,其中v是head[i-1]对 应的内部结点,u是该树枝中v的上一个内部结点(u最接近v,也可以等于v)。接下来在插入Suffix[i]时,先沿着u的suffix link(因为在插入完Suffix[i-1]之后,除了v可能没有suffix link外,其余的内部结点都有suffix link - 这一点可以用归纳法证明),找到树T[i-1]中的内部结点s(u)(注意:suffix link指向的结点一定都是内部结点)。这时开始进行插入Suffix[i]的操作,接下来分两步完成插入第i个后缀,第一步使用快速扫描(fast scan),沿着s(u)结点往树叶方向搜索,直到找到w结点为止,这个w结点就是v的suffix link应该指向的结点(但是此时很有可能这个suffix link还不存在),建立v到w的suffix link;第二步在找到w的基础上,使用慢速扫描(slow scan),即沿着w结点往树叶方向搜索,直到找到head[i]为止。这时就可以结束插入Suffix[i]的工作。需要注意,在fastscan和 slowscan中都需要记录u'的结点位置,这样在插入下一个后缀Suffix[i+1]时可以快速jump到s(u'),从结点s(u')开始,而不 需要像蛮力法那样从root结点去搜索了,这就是为什么mcc算法能够达到O(n)线性复杂度的原因。
mcc算法执行过程中,需要注意几点:
1)suffix link指向的结点一定是内部结点(包括root,别忘了root也是内部结点)。因为u结点永远都是内部结点,所以不需要给叶子结点(tail)建立suffix link。
2) 只有head结点可能没有suffix link,但是其它的内部结点都已经有了指向另外一个内部结点的suffix link。如果head结点还没有suffix link, 在插入下一个后缀时该head结点的suffix link会被建立。数学归纳法可证明,除了当前的head[i]结点以外的任何内部结点都一定有suffix link。
3)fast scan的目标是找w,slow scan的目标是找head[i];在fast scan结束时,找到的w结点一定是一个内部结点(即有两个或以上孩子的分岔结点)。
4) 不管是在fast scan还是slow scan阶段中,一旦新建了一个内部结点(该结点可能w,也可能是head[i]),那么应该立即结束当前后缀的插入工作。如果新建的内部结点是w,那么 它也一定是head[i]结点;如果新建的内部结点不是w结点,那么说明w是已经存在的一个内部结点。
5) tail[i]永远不空,因为文本串是T=S$这样的形式(见上面解释)。
mcc算法实现:

import java.util.LinkedList;import java.util.List;/** *  * Build Suffix Tree using McCreight Algorithm *   * Copyright (c) 2011 ljs (http://blog.csdn.net/ljsspace/) * Licensed under GPL (http://www.opensource.org/licenses/gpl-license.php)  *  * @author ljs * 2011-07-03 * */public class McCreightAlgorithm {	private class SuffixNode {				private String text;			    private List
children = new LinkedList
(); private SuffixNode link; private int start; private int end; private int pathlen; public SuffixNode(String text,int start,int end,int pathlen){ this.text = text; this.start = start; this.end = end; this.pathlen = pathlen; } public SuffixNode(String text){ this.text = text; this.start = -1; this.end = -1; this.pathlen = 0; } public int getLength(){ if(start == -1) return 0; else return end - start + 1; } public String getString(){ if(start != -1){ return this.text.substring(start,end+1); }else{ return ""; } } public boolean isRoot(){ return start == -1; } public String getCoordinate(){ return "[" + start+".." + end + "/" + this.pathlen + "]"; } public String toString(){ return getString() + "(" + getCoordinate() + ",link:" + ((this.link==null)?"N/A":this.link.getCoordinate()) + ",children:" + children.size() +")"; } } private class State{ private SuffixNode u; //parent(head) private SuffixNode w; //s(head[i-1]) private SuffixNode v; //head[i-1] private int j; //the global index of text starting from 0 to text.length() private boolean finished; //is this suffix insertion finished? } private SuffixNode root; private String text; public McCreightAlgorithm(String text){ this.text = text; } //build a suffix-tree for a string of text private void buildSuffixTree() throws Exception{ if(root==null){ root = new SuffixNode(text); root.link = root; //link to itself } SuffixNode u = root; SuffixNode v = root; State state = new State(); for(int i=0;i
0) { fastscan(state,s,uvLen,j); } //establish the suffix link with v SuffixNode w = state.w; v.link = w; //execute slow scan if(!state.finished){ j = state.j; state.u = w; //w must be an internal node when state.finished=false, then it must have a suffix link, so u can be updated. slowscan(state,w,j); } u = state.u; v = state.v; } } //slow scan until head(=state.v) is found private void slowscan(State state,SuffixNode currNode,int j){ boolean done = false; int keyLen = text.length() - j; for(int i=0;i
/ | \ // e f insert "c^" c^ e f int pathlen = text.length() - j + currNode.pathlen; SuffixNode node = new SuffixNode(text,j,text.length()-1,pathlen); currNode.children.add(i,node); //state.u = currNode; //currNode is already registered as state.u, so commented out state.v = currNode; state.finished = true; done = true; break; }else{ //key.charAt(0)>child.key.charAt(0) //don't forget to add the largest new key after iterating all children continue; } }else{//current child's key partially matches with the new key if(delta==len){ if(keyLen>childKeyLen){ //suffix tree with ^ ending can't have other two cases //e.g. child="ab" // ab ab // / \ ==========> / | \ // e f insert "abc^" c^ e f //recursion state.u = child; j += childKeyLen; state.j = j; slowscan(state,child,j); } }else{//0
/ \ // e f insert "abd^" c d^ // / \ // e f //insert the new node: ab int nodepathlen = child.pathlen - (child.getLength()-delta); SuffixNode node = new SuffixNode(text, child.start,child.start + delta - 1,nodepathlen); node.children = new LinkedList
(); int tailpathlen = (text.length() - (j + delta)) + nodepathlen; SuffixNode tail = new SuffixNode(text, j+delta,text.length()-1,tailpathlen); //update child node: c child.start += delta; if(text.charAt(j+delta)
/ \ // e f suffix part: "abd^" c d^ // / \ // e f //insert the new node: ab; child is now c int nodepathlen = child.pathlen - (child.getLength()-uvLen); SuffixNode node = new SuffixNode(text, child.start,child.start + uvLen - 1,nodepathlen); node.children = new LinkedList
(); int tailpathlen = (text.length() - (j + uvLen)) + nodepathlen; SuffixNode tail = new SuffixNode(text, j+uvLen,text.length()-1,tailpathlen); //update child node: c child.start += uvLen; if(text.charAt(j+uvLen)
len //e.g. child="abc", uvLen = 4 // abc // / \ ================> // e f suffix part: "abcdefg^" // // //jump to next node uvLen -= len; state.u = child; j += len; state.j = j; fastscan(state,child,uvLen,j); } break; } } } //for test purpose only public void printTree(){ System.out.format("The suffix tree for S = %s is: %n",this.text); this.print(0, this.root); } private void print(int level, SuffixNode node){ for (int i = 0; i < level; i++) { System.out.format(" "); } System.out.format("|"); for (int i = 0; i < level; i++) { System.out.format("-"); } System.out.format("%s(%d..%d/%d)%n", node.getString(),node.start,node.end,node.pathlen); //System.out.format("(%d,%d)%n", node.start,node.end); for (SuffixNode child : node.children) { print(level + 1, child); } } public static void main(String[] args) throws Exception { //test suffix-tree System.out.println("****************************"); String text = "xbxb^"; //the last char must be unique! McCreightAlgorithm stree = new McCreightAlgorithm(text); stree.buildSuffixTree(); stree.printTree(); System.out.println("****************************"); text = "mississippi^"; stree = new McCreightAlgorithm(text); stree.buildSuffixTree(); stree.printTree(); System.out.println("****************************"); text = "GGGGGGGGGGGGCGCAAAAGCGAGCAGAGAGAAAAAAAAAAAAAAAAAAAAAA^"; stree = new McCreightAlgorithm(text); stree.buildSuffixTree(); stree.printTree(); System.out.println("****************************"); text = "ABCDEFGHIJKLMNOPQRSTUVWXYZ^"; stree = new McCreightAlgorithm(text); stree.buildSuffixTree(); stree.printTree(); System.out.println("****************************"); text = "AAAAAAAAAAAAAAAAAAAAAAAAAA^"; stree = new McCreightAlgorithm(text); stree.buildSuffixTree(); stree.printTree(); System.out.println("****************************"); text = "minimize"; //the last char e is different from other chars, so it is ok. stree = new McCreightAlgorithm(text); stree.buildSuffixTree(); stree.printTree(); System.out.println("****************************"); //the example from McCreight's: A Space-Economical Suffix Tree Construction Algorithm text = "bbbbbababbbaabbbbbc^"; stree = new McCreightAlgorithm(text); stree.buildSuffixTree(); stree.printTree(); }}

测试输出:

****************************The suffix tree for S = xbxb^ is: |(-1,-1) |-(4,4) |-(1,1)  |--(4,4)  |--(2,4) |-(0,1)  |--(4,4)  |--(2,4)****************************The suffix tree for S = mississippi^ is: |(-1,-1) |-(11,11) |-(1,1)  |--(11,11)  |--(8,11)  |--(2,4)   |---(8,11)   |---(5,11) |-(0,11) |-(8,8)  |--(10,11)  |--(9,11) |-(2,2)  |--(4,4)   |---(8,11)   |---(5,11)  |--(3,4)   |---(8,11)   |---(5,11)****************************The suffix tree for S = GGGGGGGGGGGGCGCAAAAGCGAGCAGAGAGAAAAAAAAAAAAAAAAAAAAAA^ is: |(-1,-1) |-(15,15)  |--(16,16)   |---(17,17)    |----(18,18)     |-----(35,35)      |------(36,36)       |-------(37,37)        |--------(38,38)         |---------(39,39)          |----------(40,40)           |-----------(41,41)            |------------(42,42)             |-------------(43,43)              |--------------(44,44)               |---------------(45,45)                |----------------(46,46)                 |-----------------(47,47)                  |------------------(48,48)                   |-------------------(49,49)                    |--------------------(50,50)                     |---------------------(51,51)                      |----------------------(52,53)                      |----------------------(53,53)                     |---------------------(53,53)                    |--------------------(53,53)                   |-------------------(53,53)                  |------------------(53,53)                 |-----------------(53,53)                |----------------(53,53)               |---------------(53,53)              |--------------(53,53)             |-------------(53,53)            |------------(53,53)           |-----------(53,53)          |----------(53,53)         |---------(53,53)        |--------(53,53)       |-------(53,53)      |------(53,53)     |-----(19,53)     |-----(53,53)    |----(19,53)    |----(53,53)   |---(19,53)   |---(53,53)  |--(19,19)   |---(27,27)    |----(32,53)    |----(28,29)     |-----(32,53)     |-----(30,53)   |---(20,20)    |----(25,53)    |----(21,53)  |--(53,53) |-(12,12)  |--(15,15)   |---(16,53)   |---(26,53)  |--(13,13)   |---(22,53)   |---(14,53) |-(0,0)  |--(22,22)   |---(32,53)   |---(23,23)    |----(29,29)     |-----(32,53)     |-----(30,53)    |----(24,53)  |--(12,12)   |---(15,15)    |----(16,53)    |----(26,53)   |---(13,13)    |----(22,53)    |----(14,53)  |--(1,1)   |---(12,53)   |---(2,2)    |----(12,53)    |----(3,3)     |-----(12,53)     |-----(4,4)      |------(12,53)      |------(5,5)       |-------(12,53)       |-------(6,6)        |--------(12,53)        |--------(7,7)         |---------(12,53)         |---------(8,8)          |----------(12,53)          |----------(9,9)           |-----------(12,53)           |-----------(10,10)            |------------(12,53)            |------------(11,53) |-(53,53)****************************The suffix tree for S = ABCDEFGHIJKLMNOPQRSTUVWXYZ^ is: |(-1,-1) |-(0,26) |-(1,26) |-(2,26) |-(3,26) |-(4,26) |-(5,26) |-(6,26) |-(7,26) |-(8,26) |-(9,26) |-(10,26) |-(11,26) |-(12,26) |-(13,26) |-(14,26) |-(15,26) |-(16,26) |-(17,26) |-(18,26) |-(19,26) |-(20,26) |-(21,26) |-(22,26) |-(23,26) |-(24,26) |-(25,26) |-(26,26)****************************The suffix tree for S = AAAAAAAAAAAAAAAAAAAAAAAAAA^ is: |(-1,-1) |-(0,0)  |--(1,1)   |---(2,2)    |----(3,3)     |-----(4,4)      |------(5,5)       |-------(6,6)        |--------(7,7)         |---------(8,8)          |----------(9,9)           |-----------(10,10)            |------------(11,11)             |-------------(12,12)              |--------------(13,13)               |---------------(14,14)                |----------------(15,15)                 |-----------------(16,16)                  |------------------(17,17)                   |-------------------(18,18)                    |--------------------(19,19)                     |---------------------(20,20)                      |----------------------(21,21)                       |-----------------------(22,22)                        |------------------------(23,23)                         |-------------------------(24,24)                          |--------------------------(25,26)                          |--------------------------(26,26)                         |-------------------------(26,26)                        |------------------------(26,26)                       |-----------------------(26,26)                      |----------------------(26,26)                     |---------------------(26,26)                    |--------------------(26,26)                   |-------------------(26,26)                  |------------------(26,26)                 |-----------------(26,26)                |----------------(26,26)               |---------------(26,26)              |--------------(26,26)             |-------------(26,26)            |------------(26,26)           |-----------(26,26)          |----------(26,26)         |---------(26,26)        |--------(26,26)       |-------(26,26)      |------(26,26)     |-----(26,26)    |----(26,26)   |---(26,26)  |--(26,26) |-(26,26)****************************The suffix tree for S = minimize is: |(-1,-1) |-(7,7) |-(1,1)  |--(4,7)  |--(2,7)  |--(6,7) |-(0,1)  |--(2,7)  |--(6,7) |-(2,7) |-(6,7)****************************The suffix tree for S = bbbbbababbbaabbbbbc^ is: |(-1,-1) |-(19,19) |-(5,5)  |--(12,19)  |--(6,6)   |---(7,19)   |---(9,10)    |----(11,19)    |----(16,19) |-(0,0)  |--(5,5)   |---(12,19)   |---(6,6)    |----(7,19)    |----(9,19)  |--(1,1)   |---(5,5)    |----(12,19)    |----(6,19)   |---(2,2)    |----(5,5)     |-----(12,19)     |-----(6,19)    |----(3,3)     |-----(5,19)     |-----(4,4)      |------(5,19)      |------(18,19)     |-----(18,19)    |----(18,19)   |---(18,19)  |--(18,19) |-(18,19)

参考资料:

EDWARD M. McCREIGHT, Journal of the Association for Computing Machinery, Vol 23, No. 2, April 1976, A Space-Economical Suffix Tree Construction Algorithm

import java.util.LinkedList;
import java.util.List;
/**
 *
 * Build Suffix Tree using McCreight Algorithm
 * 
 * Copyright (c) 2011 ljs (http://blog.csdn.net/ljsspace/)
 * Licensed under GPL (http://www.opensource.org/licenses/gpl-license.php)
 *
 * @author ljs
 * 2011-07-03
 *
 */
public class McCreightAlgorithm {
    private class SuffixNode {       
        private String text;
       
        private List<SuffixNode> children = new LinkedList<SuffixNode>();
       
        private SuffixNode link;
        private int start;
        private int end;
        private int pathlen;
       
        public SuffixNode(String text,int start,int end,int pathlen){   
            this.text = text;
            this.start = start;
            this.end = end;
            this.pathlen = pathlen;
        }
        public SuffixNode(String text){       
            this.text = text;
            this.start = -1;
            this.end = -1;       
            this.pathlen = 0;
        }
        public int getLength(){
            if(start == -1) return 0;
            else return end - start + 1;
        }
        public String getString(){
            if(start != -1){
                return this.text.substring(start,end+1);
            }else{
                return "";
            }
        }
        public boolean isRoot(){
            return start == -1;
        }
        public String getCoordinate(){
            return "[" + start+".." + end + "/" + this.pathlen + "]";
        }
        public String toString(){           
            return getString() + "(" + getCoordinate()
                + ",link:" + ((this.link==null)?"N/A":this.link.getCoordinate())
                + ",children:" + children.size() +")";
        }      
    }
   
    private class State{
        private SuffixNode u; //parent(head)
        private SuffixNode w; //s(head[i-1])
        private SuffixNode v; //head[i-1]
        private int j; //the global index of text starting from 0 to text.length()
        private boolean finished; //is this suffix insertion finished?
    }
   
    private SuffixNode root;
    private String text;
   
    public McCreightAlgorithm(String text){
        this.text = text;
    }
    //build a suffix-tree for a string of text
    private void buildSuffixTree() throws Exception{       
        if(root==null){
            root = new SuffixNode(text);       
            root.link = root; //link to itself
        }
               
        SuffixNode u = root;
        SuffixNode v = root;
        State state = new State();       
       
        for(int i=0;i<text.length();i++){
            //process each suffix
           
            SuffixNode s = u.link;
           
            int uvLen=v.pathlen - u.pathlen;         
            if(u.isRoot() && !v.isRoot()){
                uvLen--;
            }
            int j = s.pathlen + i;       
                       
            //init state
            state.u = s;           
            state.w = s; //if uvLen = 0
            state.v = s;
            state.j = j;
            state.finished = false;
           
            //execute fast scan
            if(uvLen > 0) {
                fastscan(state,s,uvLen,j);
            }
           
            //establish the suffix link with v   
            SuffixNode w = state.w;
            v.link = w;
           
            //execute slow scan
            if(!state.finished){
                j = state.j;               
                state.u = w; //w must be an internal node when state.finished=false, then it must have a suffix link, so u can be updated.
                slowscan(state,w,j);
            }       
           
            u = state.u;
            v = state.v;
        }
    }
    //slow scan until head(=state.v) is found
    private void slowscan(State state,SuffixNode currNode,int j){
        boolean done = false;       
        int keyLen = text.length() - j;
        for(int i=0;i<currNode.children.size();i++){
            SuffixNode child = currNode.children.get(i);
           
            //use min(child.key.length, key.length)           
            int childKeyLen = child.getLength();
            int len = childKeyLen<keyLen?childKeyLen:keyLen;
            int delta = 0;
            for(;delta<len;delta++){
                if(text.charAt(j+delta) != text.charAt(child.start+delta)){
                    break;
                }
            }
            if(delta==0){//this child doesn't match    any character with the new key           
                //order keys by lexi-order
                if(text.charAt(j) < text.charAt(child.start)){
                    //e.g. child="e" (currNode="abc")
                    //       abc                     abc
                    //    /  \    =========>      / | \
                    //   e    f   insert "c$"    c$ e  f
                    int pathlen = text.length() - j + currNode.pathlen;
                    SuffixNode node = new SuffixNode(text,j,text.length()-1,pathlen);
                    currNode.children.add(i,node);       
                    //state.u = currNode; //currNode is already registered as state.u, so commented out
                    state.v = currNode;
                    state.finished = true;
                    done = true;
                    break;                   
                }else{ //key.charAt(0)>child.key.charAt(0)
                    //don't forget to add the largest new key after iterating all children
                    continue;
                }
            }else{//current child's key partially matches with the new key   
                if(delta==len){
                    if(keyLen>childKeyLen){ //suffix tree with $ ending can't have other two cases
                        //e.g. child="ab"
                        //       ab                      ab
                        //    /  \    ==========>     / | \                            
                        //   e    f   insert "abc$"  c$ e  f       
                        //recursion
                        state.u = child;
                        j += childKeyLen;
                        state.j = j;
                        slowscan(state,child,j);
                    }
                }else{//0<delta<len
           
                    //e.g. child="abc"
                    //       abc                     ab
                    //    /  \     ==========>     / \
                    //   e    f   insert "abd$"   c  d$
                    //                           /  \
                    //                          e    f                   
                    //insert the new node: ab
                    int nodepathlen = child.pathlen
                            - (child.getLength()-delta);
                    SuffixNode node = new SuffixNode(text,
                            child.start,child.start + delta - 1,nodepathlen);
                    node.children = new LinkedList<SuffixNode>();
                   
                    int tailpathlen = (text.length() - (j + delta)) + nodepathlen;
                    SuffixNode tail = new SuffixNode(text,
                            j+delta,text.length()-1,tailpathlen);
                   
                    //update child node: c
                    child.start += delta;
                    if(text.charAt(j+delta)<text.charAt(child.start)){
                        node.children.add(tail);
                        node.children.add(child);
                    }else{
                        node.children.add(child);
                        node.children.add(tail);                           
                    }
                    //update parent
                    currNode.children.set(i, node);
                   
                    //state.u = currNode; //currNode is already registered as state.u, so commented out
                    state.v = node;
                    state.finished = true;                   
                }
                done = true;
                break;
            }
        }
        if(!done){
            int pathlen = text.length() - j + currNode.pathlen;
            SuffixNode node = new SuffixNode(text,j,text.length()-1,pathlen);
            currNode.children.add(node);
            //state.u = currNode; //currNode is already registered as state.u, so commented out
            state.v = currNode;   
            state.finished = true;
        }
    }
    //fast scan until w is found
    private void fastscan(State state,SuffixNode currNode,int uvLen,int j){         
       
        for(int i=0;i<currNode.children.size();i++){
            SuffixNode child = currNode.children.get(i);
           
            if(text.charAt(child.start) == text.charAt(j)){
                int len = child.getLength();
                if(uvLen==len){
                    //then we find w           
                    //uvLen = 0;                   
                    //need slow scan after this child
                    state.u = child;   
                    state.w = child;
                    state.j = j+len;
                }else if(uvLen<len){
                    //branching    and cut child short                               
                    //e.g. child="abc",uvLen = 2
                    //       abc                          ab
                    //    /  \    ================>     / \
                    //   e    f   suffix part: "abd$"  c   d$
                    //                                /  \
                    //                               e    f               
                   
                    //insert the new node: ab; child is now c
                    int nodepathlen = child.pathlen
                            - (child.getLength()-uvLen);
                    SuffixNode node = new SuffixNode(text,
                            child.start,child.start + uvLen - 1,nodepathlen);
                    node.children = new LinkedList<SuffixNode>();
                   
                    int tailpathlen = (text.length() - (j + uvLen)) + nodepathlen;
                    SuffixNode tail = new SuffixNode(text,
                            j+uvLen,text.length()-1,tailpathlen);
                   
                    //update child node: c
                    child.start += uvLen;
                    if(text.charAt(j+uvLen)<text.charAt(child.start)){
                        node.children.add(tail);
                        node.children.add(child);
                    }else{
                        node.children.add(child);
                        node.children.add(tail);                           
                    }
           
                    //update parent
                    currNode.children.set(i, node);
                   
                    //uvLen = 0;
                    //state.u = currNode; //currNode is already registered as state.u, so commented out
                    state.w = node;   
                    state.finished = true;
                    state.v = node;                   
                   
                }else{//uvLen>len
                    //e.g. child="abc", uvLen = 4
                    //       abc                         
                    //    /  \    ================>     
                    //   e    f   suffix part: "abcdefg$"  
                    //                               
                    //                 
                    //jump to next node
                    uvLen -= len;
                    state.u = child;
                    j += len;
                    state.j = j;
                    fastscan(state,child,uvLen,j);
                }
                break;
            }
        }       
    }
    //for test purpose only
    public void printTree(){
        System.out.format("The suffix tree for S = %s is: %n",this.text);
        this.print(0, this.root);
    }
    private void print(int level, SuffixNode node){
        for (int i = 0; i < level; i++) {
            System.out.format(" ");
        }
        System.out.format("|");
        for (int i = 0; i < level; i++) {
            System.out.format("-");
        }
        //System.out.format("%s(%d..%d/%d)%n", node.getString(),node.start,node.end,node.pathlen);
        System.out.format("(%d..%d/%d)%n", node.start,node.end,node.pathlen);
        for (SuffixNode child : node.children) {
            print(level + 1, child);
        }       
    }
    public static void main(String[] args) throws Exception {
        //test suffix-tree
        System.out.println("****************************");       
        String text = "xbxb$"; //the last char must be unique!
        McCreightAlgorithm stree = new McCreightAlgorithm(text);
        stree.buildSuffixTree();
        stree.printTree();
       
        System.out.println("****************************");       
        text = "mississippi$";
        stree = new McCreightAlgorithm(text);
        stree.buildSuffixTree();
        stree.printTree();
       
        System.out.println("****************************");       
        text = "GGGGGGGGGGGGCGCAAAAGCGAGCAGAGAGAAAAAAAAAAAAAAAAAAAAAA$";
        stree = new McCreightAlgorithm(text);
        stree.buildSuffixTree();
        stree.printTree();
       
        System.out.println("****************************");       
        text = "ABCDEFGHIJKLMNOPQRSTUVWXYZ^";
        stree = new McCreightAlgorithm(text);
        stree.buildSuffixTree();
        stree.printTree();
        System.out.println("****************************");       
        text = "AAAAAAAAAAAAAAAAAAAAAAAAAA^";
        stree = new McCreightAlgorithm(text);
        stree.buildSuffixTree();
        stree.printTree();
       
        System.out.println("****************************");       
        text = "minimize";  //the last char e is different from other chars, so it is ok.
        stree = new McCreightAlgorithm(text);
        stree.buildSuffixTree();
        stree.printTree();
       
        System.out.println("****************************");       
        text = "xabxa$";
        stree = new McCreightAlgorithm(text);
        stree.buildSuffixTree();
        stree.printTree();
    }
}

转载于:https://www.cnblogs.com/ljsspace/archive/2011/07/03/2096779.html

你可能感兴趣的文章
Kinect 开发 —— 进阶指引(上)
查看>>
python学习笔记(六)time、datetime、hashlib模块
查看>>
uva489(需要考虑周全)
查看>>
C-关键字(二)
查看>>
排序笔记
查看>>
下载360doc.com里的文章
查看>>
【转】globk和glorg中使用的apr文件
查看>>
导航,头部,CSS基础
查看>>
PostMessage 解析
查看>>
Java语法基础(一)
查看>>
as3 sort
查看>>
hdu 2680 Choose the best route Dijkstra 虚拟点
查看>>
26. Remove Duplicates from Sorted Array java solutions
查看>>
[bzoj1185] [HNOI2007]最小矩形覆盖
查看>>
全景图制作详解
查看>>
React之todo-list
查看>>
cocoapods降级版本
查看>>
MYSQL复习笔记4-基本SQL语句
查看>>
C#&java重学笔记(函数)
查看>>
14软件G2班
查看>>