Updated: Apr 15
For a given input string s, return the length of the longest substring in s without repeating characters. s consists of English letters and digits.
Input: s = "abcabcbb" Output: 3
Input: s = "bbbbb" Output: 1
Input: s = "pwwkew" Output: 3
Let us try to understand the problem statement first. This is a pretty straightforward problem if you know what a substring is for a given string.
A substring a continuous sequence of characters in a given string. For example, the substrings of string "abcd" are: "a", "ab", "abc", "abcd", "b", "bc", "bcd", "c", "cd" and "d". If the characters are not continuous it is not considered a substring. "bd", "ad", "acd" etc. are not substrings because the characters are not continuous as compared to the original string s.
Note: Substring is different from a subsequence which may not be consecutive in nature.
Now that we know what a substring is, we need to write an algorithm that returns the length of the longest substring in string s. But wait, remember we cannot just return any substring that is longest as clearly mentioned in the problem statement, we are only allowed to consider substrings without repeating characters. "Substring without repeating characters" is nothing but a substring containing all unique characters, there should not be any duplicate characters/letters in the substring.
Now that the problem statement is clear, lets see what approach we can use to solve it:
Solution 1: Naive Approach
The most simple and straightforward solution to this problem is to check every possible substring for the given string s and return the length of the longest substring without repeating characters.
This algorithm involves the following steps:
For the given string s, generate all possible substrings. We can generate all possible substrings by using two loops. Fix the outer loop and move the inner loop (second loop) index to generate all possible substrings (refer code for more details).
As and when you obtain the substrings using the loops (described in step 1), you can check if the current substring contains only distinct characters or has duplicates. We can do this by going trough all the characters of the obtained substring using another loop and adding it to a hashmap.
During the iteration of the third loop, if you encounter any character that is already in the hashmap, the obtained substring contains duplicate characters. Otherwise all the characters in the substring are unique.
If the substring contains duplicate characters, skip the current substring, do not consider it for calculating the result.
If the substring contains all unique characters, check if the length of the current substring is greater than the length of all the unique character substrings obtained so far. If yes, update the result, else move on to the next substring and repeat steps 1-5.
The running time complexity of this algorithm O(n^3).
Time Complexity: O(n^3)
Time complexity is O(n^2) for finding the substrings using first and second loops and O(n) to find out if it is contains non repeating characters. So the total time complexity is O(n) * O(n^2) = O(n^3).
Space Complexity: O(1)
You might ask, we are making use of a hashmap to store characters of string, so how is the space complexity O(1). If you look at the problem statement carefully, it clearly states the given input string can contain only English letters and digits. Therefore the space needed to store these characters in a hashmap is constant i.e. 26+26 for upper and lower case English letters and 10 for digits. So in the worst case the space needed to store this in a hashmap is O(26+26+10) = O(62), which in general can be represented as O(1).
This approach can be optimized to improve the time complexity to O(n^2) which is better than our earlier time complexity of O(n^3), but can we do better?
Want to master coding? Looking to learn new skills and crack interviews? We recommend you to explore these tailor made courses:
Solution 2: Sliding Window Technique
This problem can be efficiently solved by using the two pointer sliding window technique. We slide the window forward each time we find a duplicate character. In this approach we need to process each character in the given string only once to find out the result.
For a given input string s this algorithm does the following steps:
We take two variables start and end to indicate the start and end of the window. The string between the start and end is our current substring. Initially both start and end point to the 0th index (1st character) of the string s, i.e. start = 0 and end = 0. Also create a result variable to store the required result.
Next we create a hashmap which helps us to efficiently check if the current substring contains any duplicate characters. Lets name it charIndexMap. Key to this hashmap is the current character and value of this hashmap is the position of the character represented by the hashmap key in the given string s.
In each iteration we check if the current character at end index is present in the charIndexMap or not.
If the character is not present, add it to the hashmap along with the index.
If the character is present in the map already, it means that the current character is a duplicate and we should not be considering the substring with this character while calculating the result. So, take the length of the substring from the start of the window until the previous character. If the current length is greater than the result, update the result. Also since we have found a duplicate character, we need to update the start index. Another important step here is to remove all characters from the previous window from our hashmap (since we have moved our window by updating the start variable). Finally add the current character to our hashmap along with its index.
How do we update the start variable once a duplicate character is found?
Remember along with storing the characters in hashmap, we also store the index of the character in the string as hashmap value. When we get a duplicate character during our iteration what does this value in hashmap mean? It simply means that the current duplicate character was earlier found at this index represented by the hashmap value (first occurrence). Using this info we discard all the characters before the first occurrence of the duplicate including the first occurrence (discard all characters from previous window). So our new start value will be start = map[current character] + 1.
Note: Character literals are represented by uint8 or rune datatype in go.
Lets see the working of the above algorithm with an example:
Consider s = "abcabcbb" is the given string. Initially start = 0 and end = 0 and our hashmap is empty as shown in the figure below. Also we initialize result = 0.
Now we check if s[end] = 'a' is present in our hashmap. Obviously 'a' is not present in hashmap, so we add 'a' as hashmap key and 0 as hashmap value indicating 'a' is found at 0th position in string (index based position). Same is shown in figure below:
Next we increment end pointer.
Again check if s[end] = 'b' is present in map. As 'b' is not present we add it to hashmap as shown in diagram below.
Increment the end pointer, s[end] = 'c', and since it is not in map we add it to our hashmap.
Increment end again. Now s[end] = 'a'.
Now s[end]='a' is present in map, which means it is a duplicate character. Since we are looking only for substrings without repeating characters, we cannot consider the current substring "abca" for our result, so our valid substring is only till the previous character, i.e "abc".
Since we have found a duplicate the next step is to update result. result = max(result , length of substring from index 0 to 2) = max(result , length of substring "abc") = max (3,3) = 3.
Also since the current character is a duplicate, we need to update out start pointer. We update start as start = first occurrence of duplicate + 1. We can get the first occurrence of duplicate from our hashmap. So start = map['a'] + 1 = 0+1 = 1.
Next, remove all the characters before our new start index and previous start index from hashmap (in this case our previous start index was 0 and new start index which we calculated a just before this step is 1) since they are no longer in our new window, so we remove 'a' from hashmap.
Add the current character 'a' to map along with its index to hashmap. Now our diagram looks as shown below:
Increment end pointer, s[end] = 'b'.