Updated: Apr 9
Binary search is one of the most popular searching algorithms in computer science. If you know the binary search algorithm already, you may have come across a section in this algorithm where you try to find the middle element in the array (or search space to be more precise).
Note: It is important to note that this overflow error is not limited to binary search alone. It applies all algorithms that have a similar use case for getting middle element.
A very basic implementation of binary search uses the below formula for calculating the middle element:
mid = (left + right)/2
where left and right represent the indexes of the first and the last element in the current search space. A first glance on this basic formula for middle index calculation, everything looks correct and in fact the formula works perfectly fine for most cases, until you start noticing that your algorithm is producing an unexpected output (or even error in some programming languages) for some specific values of input.
The reason for this unexpected failure may not be fairly obvious at first. It occurs due to a hidden flaw in this seemingly straight forward formula for middle index calculation. At the root, the reason for this failure is something called an overflow condition in computer programming.
Basic Formula: Overflow Explained
Why does the above basic formula to calculate middle value result in an overflow?
The max value an integer (data type in a programming language) can hold can vary from machine to machine, OS to OS or even based on the programming language itself. For the sake of our example and for easier understanding, assume this value, lets call it MAX_INT, is 100.
Consider we are given an array with n=90 elements and our target element is the last element in the array, at 89th index. Binary search algorithm starts with left=0 and right=89. Since the target element is the last element in the array our search space keeps shifting to the right (refer binary search for more details). So (left,right) will be (0,89) for the first iteration and (45,89) for the second iteration. Now using our basic formula to calculate mid value for the first iteration, we get mid = (0+89)/2 = 45, for the second iteration, mid = (45 + 89), but (45 + 89) exceeds our MAX_INT value of 100 resulting in overflow. Thus our program might produce some unexpected output or result in an error depending on the programming language we are using. This is how the basic formula for middle element calculation fails. Also note that, generally the MAX_INT will be much higher, we assumed it as 100 just for the ease of understanding.
Now that we know why the overflow occurs when we use the basic formula, lets see how it can be prevented using the optimized formula mid=left + (right - left)/2.
Mathematically both the equations (one used in basic formula and the other in optimized formula) are same and they produce the same output, i.e. left + (right - left)/2 = (left + right)/2.
We can prove this as shown below. Lets say,
left + (right - left)/2 = x
Multiplying 2 on both sides of the equation we get:
2 * left + (right - left) = 2 * x
Simplifying the equation will result in:
left + right = 2 * x
which implies x = (left + right)/2
Therefore we can say that, left + (right - left)/2 = (left + right)/2 which means both equations will produce the same output for all values of left and right.
Now that we know both equations are same, lets see why left + (right - left)/2 does not result in an overflow as compared to (left + right)/2.
Remember left and right represent indexes of the leftmost and rightmost element in our search space. To prevent overflow we have to ensure that, the value of the equation left + (right - left)/2 equating to mid (which is also an index),
Does not exceed the max integer value.
Also that the value isn't negative.
Lets see how the equation left + (right - left)/2 ensures the above two conditions are met.
As a matter of fact we know that left, right >= 0 and right is always greater than or equal to left for all cases for the given input array. Thus we can safely say that, (right-left) >= 0 for all values of right and left. Therefore it is safe to say (right - left)/2 in equation left + (right - left)/2 cannot be negative, and hence the optimized equation left + (right - left)/2 cannot be negative. Therefore we have taken care of the lower bound (first condition).
Now coming to the second condition, for the given input array, by definition (left,right) <= MAX_INT value. So (right - left) will result in a value smaller than MAX_INT always. And therefore, (right - left)/2 is also smaller than MAX_INT. Mathematically, if you add left to this value i.e. left + (right - left)/2, it can't exceed MAX_INT. So the new equation satisfies the second condition as well.
Therefore the equation mid = left + (right-left)/2 doesn't result in an overflow.
Binary search is a fairly simple algorithm. But it has a few subtleties, one of which we have tried to address in this article. At first glance this might seem like a simple and straightforward thing to be even missed. However such things when missed, even though they seem minor, can really come to bite you at a later point.
There are many variations of binary search and caveats associated with it. We will try to cover them in our upcoming articles. Until then stay tuned!
That is all for this article, thank you for taking your time to read this. If you have any questions or doubts, please let us know in the comments section below, we will be happy to answer you.
If you found this article useful, do not forget to subscribe to our website, your support motivates us to bring out more such articles in future (scroll down to the bottom of the page to find the subscription form).
You can explore more such amazing articles from code recipe in our blogs section.
Get 100% discount on Code Recipe Membership Plan. Join now and get exclusive access to premium content for free. Hurry! Offer only available for a limited time. Join now.
Want to master coding? Looking to learn new skills and crack interviews? We recommend you to explore these tailor made courses: