Binary Search Basics
If you know what is binary search, I encourage you to skip the first section and goes to the questions.
Before we dive into real problems, I want to briefly discuss about binary search. Binary search is a technic that can search for the existance of a value in a sorted array in O(logn) time, where n is the size of that array.
An example array v, which is sorted, is shown below:
1 | index: 0 1 2 3 4 5 6 7 8 9 10 |
Say we want to determine if 10 is in the array or not, an intuitive solution is walk though this array and check arr[i] == 9. The time complexity of such algorithm is O(n). Note that this is usually the best solution if the array is unsorted because binary search can not be applied to unsorted array (A rotated, sorted array is an exception and we will talk about it later).
To reduce the time complexity of searching for a value in sorted array from O(n) to O(logn), we will use binary search next.
The C++ snippet is as follow:
Let’s walk though the code using our example:
At the very beginning, l = 0 and r = 10. v[l:r], where l:r means [l,l+1,...,r-1,r], denotes the search range, in other words, we know that if the target value exists in v, it must lies in v[l:r].
1 | index: 0 1 2 3 4 5 6 7 8 9 10 |
Then we compute the median v[m] of v[l:r], which is the value right in the middle of v[l:r]. Note that we are not necessary computing the exact median when there are even number of elements in the range. That is to say we are calculating $v[ \lfloor \frac{l+r}{2} \rfloor ]$.
1 | index: 0 1 2 3 4 5 6 7 8 9 10 |
As we can see, v[m] = 9 and v[m] < 10. Since v[m] = max(v[0:m]), we can safely say that if 10 is in v, it must in v[m:r].
With that in mind, we narrow our search space to v[5,10].
1 | index: 0 1 2 3 4 5 6 7 8 9 10 |
After we reduce the search range, we calculate m by (l+r)/2. Note l, r, and 2 are integers therefore the fraction part of (l+r)/2 is ignored.
1 | index: 0 1 2 3 4 5 6 7 8 9 10 |
This time v[m] is larger than 10. Because v[m] = min(v[m:(v.size()-1)]), we can safely conclude that 10 won’t appear in v[m:10]. So we change our search space to v[l:m] by setting r = m.
1 | index: 0 1 2 3 4 5 6 7 8 9 10 |
Now v[m] = 11, which is still larger than our target 10, therefore we set r = m. Note that in our C++ snippet we stop when l == r - 1, therefore we stop when l and r are adjecent.
1 | index: 0 1 2 3 4 5 6 7 8 9 10 |
Then we test v[l] and v[r] to see if they equal to 10. Unfortunately none of them is 10 therefore we say 10 is not in this array v. One may wondering why comparing v[r] since r is m in the previous step and we already compared v[m] in last step. Thing may get tricky when one is trying to save this two comparison. An example is what if l == m? In such case we do not need to compare v[l] but v[r].
Binary Search Questions
There are five main parts of a binary search algorithm:
- Invalid input detection. One need to filter our all input with a length less than 0 because this will make
r == -1. - Iteration stop condition. Usually I like to use
l < r - 1because it is easy to implement and can avoid specific cases. But you can always usel < r - k, wherek >= 0. - Pruning condition. The classic binary search uses
v[m] < targetorv[m] > target, but this depends on your problems. - Boundry condition. By using
l < r - 1, one do not need to do detecting but check the value ofv[l]andv[r]. - Boundry check. Check boundary values
v[l]andv[r]and deal with the case when target is not found.
Problems on Sorted Arrays
Binary Search
The question description can be found at the following site:
Lintcode: (14) Binary Search
This is the basic binary search problem, the code can be found above.
Search insert position
The question description can be found at the following sites:
Lintcode: (60) Search insert position
Leetcode: (35) Search insert position
The answer of this question is a twisted version of classic binary search. Namely we just need to change the invalid input detection and boundry check.
The invalid input detection should output 0 instead of -1 because when target is not found, the output should be the index of target if it was inserted. In this case, it is 0.
As for boundary check, we return l or r if m[l] == target or m[r] == target. If none of them equals to target, we need to determine where to insert target. The index should be the smallest value in m that is larger than target.
Wood Cut
Find Bad Version
Find the nearest pow(i,2) that equals or less thansqrt(x)
The question description can be found at the following sites:
Lintcode: (141) sqrt(x)
Leetcode: (69) sqrt(x)
This problem is also a variant of binary search problem. One thing we need to mention is how to calculate sqrt(2). The question does not require you to return 1.414 but 1 (the largest integer that satifies x^2 < target). Therefore, we need to change the pruning condition to comparing m[mid]^2 and x.
Find value range in sorted array containing duplicate values
The question description can be found at the following sites:
Lintcode: (61) Search for a Range
Leetcode: (34) Search for a Range
If we keep using the old algorithm, we can find v[index] = target, but we can not guarantee which index we returned. An example is:
1 | index: 0 1 2 3 4 5 6 7 8 9 10 |
Here we return index = 2, but we can not return the correct range of 3, which is [1,3].
A simple solution is search the left and right part of index one by one until we find an element not equals to target or reaches the boundary. This is doable but in the worse case it will cost O(n) time, which is not what we want.
To solve this problem in O(logn) time as usual, we need to twist the traditional binary search algorithm. Remember, we keep reducing the search space by comparing v[mid] to target, and we stop when v[mid] == target. What if we change the comparison? I do not know how to search the range in one binary search, so I break the result into two parts, namely find the left most target and the right most target.
To find the left most target in this array, we should change the comparison from
1 | if (v[m] > target) { |
to
1 | if (v[m] >= target) { |
In such way, if we find v[m] == target, we will not stop. Instead we will search the left part while keeping v[r] == target. When l == r - 1, we just check v[l] and v[r] and see if v[l] == target, which is unlikely in this case, or v[r] == target. We return the index of the leftmost one that equals to target. By doing this, we find the leftmost target.
The way of finding the rightmost target is similar and is shown in the follwing code.