Find All the Start Indicies of an Anagram in a Separate String

Given two strings s and p, find all the start indices of p’s anagrams in s.

Example 1:

Input: s = “cbaebabacd”, p = “abc”
Output: [0,6]
Explanation: The substring with start index = 0 is “cba”, which is an anagram of “abc”. The substring with start index = 6 is “bac”, which is an anagram of “abc”.

Example 2:

Input: s = “abab”, p = “ab”
Output: [0,1,2]
Explanation: The substring with start index = 0 is “ab”, which is an anagram of “ab”. The substring with start index = 1 is “ba”, which is an anagram of “ab”. The substring with start index = 2 is “ab”, which is an anagram of “ab”.

Approach 1: Sorting

Intuition

Since the anagrams contain same letters with same frequencies, we can sort the strings and check if they are equal. If they are equal, the strings are anagram to each other.

Implementation

vector<int> findAnagrams(string s, string p) {
    int n = s.length(), m = p.length();
    vector<int> ans;
    sort(p.begin(), p.end());
    for (int i = 0; i <= n - m; ++i) {
        string cur = s.substr(i, m);
        sort(cur.begin(), cur.end());
        // if sorted values match, we have found an anagram
        if (cur == p) {
            ans.push_back(i);
        }
    }
    return ans;
}

Complexity Analysis

Time Complexity: $O(NMlogM)$ where N and M are length of s and p respectively
Space Complexity: $O(M)$ extra for the substring we are extracting at each step

Approach 2: Counting the Frequency of Characters

Intuition

By definition, anagrams have same letters with same frequencies. Thus, we can check the frequencies of the letters are same.

Implementation

vector<int> findAnagrams(string s, string p) {
    int n = s.length(), m = p.length();
    vector<int> count(26), countp(26);
    for (int i = 0; i < m; i++) {
        countp[p[i] - 'a']++;
    }
    vector<int> ans;
    for (int i = 0; i < n; i++) {
        // add the count of i-th letter
        count[s[i] -'a']++;
        if (i >= m) {
            // remove the count of (i - m)-th letter
            count[s[i - m] - 'a']--;
            // time complexity for this operation is O(L)
            if (count == countp) {
                ans.push_back(i - m + 1);
            }
        }
    }
    return ans;
}

Complexity Analysis

Time Complexity: $O(N * L)$ , where $L =$ number of unique letters used. In this case, $L = 26$ .
Space Complexity: $O(L) = O(26) = O(1)$ , required to store the count of the letters.

Approach 3: Sliding Window

Intuition

Note that, in the previous approach in each step we are checking if the hashmaps are equal even though count of only 1 character changes at each step. This takes $O(L)$ time.

Instead of checking the equality of the frequency hashmaps at each step, we can use the sliding window approach to reduce the time complexity.

Implementation

vector<int> findAnagrams(string s, string p) {
    vector<int> ans;
    
    vector<int> count(26);
    // build the count hashmap, we are increasing the count of characters here
    for (char c : p) {
        ++count[c - 'a'];
    }
    int left = 0, right = 0, offset = p.length();
    while (right < s.length()) {
        // if a character which is present in `p` is encountered in the current
        // window, decrease the offset count
        if (count[s[right] - 'a'] > 0) {
            --offset;
        }
        --count[s[right] - 'a'];
        // expand the window
        ++right;
        
        // if offset is 0, we have found a valid anagram
        if (offset == 0) {
            ans.push_back(left);
        }
        
        // this condition is used to make sure we are shrinking the window (`++left`)
        // only if the length of the window is `p.length()`
        // this condition can also be replaced with if (right - left == p.length())
        if (right >= p.length()) {
            // only increase offset is the reduced value
            // (using `--count[s[right] - 'a]`) is non-negative
            if (count[s[left] - 'a'] >= 0) {
                ++offset;
            }
            ++count[s[left] - 'a'];
            ++left;
        }
    }
    
    return ans;
}

Complexity Analysis

Time Complexity: $O(N)$
Space Complexity: $O(L) = O(26) = O(1)$ ( $L =$ number of letters in the alphabet, since we are only using lowercase letters here, $L = 26$ here) used to store the frequencies of the characters.

Published 9 Aug 2021

CS Posts on Twitter