Is Subsequence


Question:

Given a string s and a string t, check if s is subsequence of t.

You may assume that there is only lower case English letters in both s and t. t is potentially a very long string, and s is a short string.

Follow up: If there are lots of incoming S, say S1, S2, ... , Sk where k >= 1Billion, and you want to check one by one to see if T has its subsequence. In this scenario, how would you optimize?


Analysis:

To check if String s is a subsequence of String t, we can easily use two pointers and scan s and t at the same time. We move the pointer of s forward once we find a matched character in t. We find s is subsequence of t if we matches all the characters in s, or not if we can't match one character of s in String t.

The time Complexity is O(sLen+tLen).

We can see that the complexity is bad when the function is applied in the follow up scenario.

To optimize the solution, we have to notice that when we apply this function to multiple string s, we are doing search in the same pattern. We try to match character s[i] of String s with character t[j] of String t, suppose we matches at that position, then we continue to search for the next index next_j that meet the requirement that t[next_j]=s[next_i] and next_j>j.

So we can record all the index of all alphabet in String t, use variable matchIndex to record the index in String t where we have t[matchIndex] == s[i]. When we move to index i+1 in String s, we are going to find the next smallest matchIndex from all the position that are all of character s[i+1].

To explain this idea, I use the simple example below:

Suppose s = cab   
        t= aaabcaab

we record all the index of all alphabet in String t as:
character index
a 0,1,2,5,6
b 3,7
c 4
Initialize matchIndex = -1
match c in s: matchIndex -> 4
match a in s: matchIndex -> 5
match b in s: matchIndex -> 7
We match s with t

The time complexity to process n number of String s is

Preprocess String t: O(tLen)
Process n# of String s: O(n*L*lgx)
                        x is the average size of character list in String t
                        L is the average length of String s

Solution:

public boolean isSubsequence(String s, String t) {
        if(s.length()==0) return true;
        int tLen=t.length();
        ArrayList<Integer> [] map =  new ArrayList[26];
        for(int i=0;i<tLen;i++){
            int index = t.charAt(i)-'a';
            if(map[index]==null) map[index]=new ArrayList<Integer>();
            map[index].add(i);
        }
        int matchIndex=-1;
        for(char x : s.toCharArray()){
            // Try to find the smallest index that's larger than last matchIndex in the list of matched character
            if(map[x-'a']==null) return false;
            int insertIndex = Collections.binarySearch(map[x-'a'], matchIndex);
            if(insertIndex>=0){
                insertIndex++;
            }else{
                insertIndex = -(insertIndex+1);
            }
            if(insertIndex>=map[x-'a'].size()) return false;
            matchIndex = map[x-'a'].get(insertIndex);
        }
        return true;
    }

Better Time complexity with sacrifice of Space complexity

We can create a direction map for all the characters of String t. The direction map for character t[j] means the next index j in String t that we can match with the next character in String s.

For the same sample:

     s = cab   
     t= aaabcaab

 We create the direction map for all characters in String t
char\index -1 0 1 2 3 4 5 6 7
a a a b c a a b
a 0 1 2 5 5 5 6 -1 -1
b 3 3 3 3 7 7 7 7 -1
c 4 4 4 4 4 -1 -1 -1 -1
:
z -1 -1 -1 -1 -1 -1 -1 -1 -1
To check whether String s `cab` is subsequence of String t `aaabcaab`:

Start from matchIndex = -1
search for c: matchIndex -> 4
search for a: matchIndex -> 5
search for b: matchIndex -> 6

We match s with t

The time complexity to process n number of String s is

Preprocess String t: O(tLen)
Process n# of String s: O(n*L)
                        L is the average length of String s

results matching ""

    No results matching ""