College recruiting season is almost upon us again, and I’ve been doing a little research to try to find a new question. My current question is OK, but it’s a little clunky, and I’ve been sniffing around for something new.
In my research I’ve been going through Programming Problems. It’s a good compendium of the different types of problems you’re likely to run into, focusing mostly on different data structures (lists, trees, queues, stacks, hash tables, and so on) and algorithms (searching, sorting, etc.). It’s interesting how common some of the questions are – even many of the “problem solving” style questions are ones I’ve seen before.
So I was a little surprised when I ran into an unfamiliar, difficult question in the early chapters (“This question was a favorite of interviewers at Google for some time,” according to Green), with a complicated answer that left me scratching my head. OK, it happens, some people ask tough questions and if you see your way through, great, if not, well, that’s the way it goes.
Except that it gnawed at me. This is the mark of a good question – it looks hard at first, but as you think about it the way opens up. And I decided that the answer in the book was, if correct, horribly over-complicated. It also had a terrible big-O running time.
And so, I present to you the problem and my preferred solution. I hope you find this useful, or at least interesting. I encourage you to try to solve the problem before reading the walkthrough.
The problem
Reconstruct a binary search tree (BST) from a list (or array) that contains the results of a traversal of the tree. You may use any traversal you wish.
Traversals
First, a little background. There are three canonical ways to traverse a BST:
- in-order: in which you first traverse the left branch, then the root, then the right branch, recursively. This gives you an ordered list of values.
- pre-order: in which you first traverse the root, then the left branch, then the right branch.
- post-order: in which you first traverse the left branch, then the right branch, then the root.
These traversals can be demonstrated with the following tree:
10 / \ 5 20
- in-order: [5, 10, 20]
- pre-order: [10, 5, 20]
- post-order: [5, 20, 10]
Now, on to the problem.
How to think about the problem
If you were guaranteed that this were a balanced, complete tree, then you could easily use the in-order traversal to solve the problem. This, however, is the first trap – every time you catch yourself adding a constraint to a problem (“if the tree is balanced and complete…”), you need to test this with the interviewer. In this case, the answer is no, it could be any valid BST, even a terribly unbalanced one.
The next step is to figure out an approach. In this problem, our first question is which traversal to use. Can we use in-order? Even with the simple tree up above, there are five different trees the in-order traversal could have come from. So no, in-order is out.
What about pre-order? With pre-order, we first push the root node, then do the left sub-tree, then the right, so we’re guaranteed that the 10 is the root, the 5 is to the left, and the 20 is to the right. Seems like a possibility.
Post-order is very similar to pre-order, only it’s the last element in the buffer that goes into the root spot. This also seems like a possibility.
Ultimately, simpler seems better, so I’m going to decide to go with pre-order.
At this point, it’s helpful to draw a representation of what your list looks like. This isn’t just for fun – research shows that it’s hard to keep multiple things in your mind simultaneously, and putting things (especially pictures) onto the whiteboard can clear space for other thoughts. Don’t be macho and think that you’re special in this regard – it’s true of everyone, and using the whiteboard to cache important visual ideas is like giving yourself a RAM upgrade.
In this case, we have the ROOT, then the Left sub-tree, then the Right sub-tree. So the elements look like this:
ROOT L L L L L L L L R R R R R R R
Crucially, the same is true of both the left and right sub-trees. I.e., the first element of each is the root of that sub-tree, and so on. This will be important later.
Set up your data structure
We’re going to go simple here – no need for a container Tree class, everything will be a Node:
public class Node { Node left, right; int value; public Node(int value) { this.value = value; } }
At this point, we need to think about the algorithm. Let’s just start writing, and see where it gets us.
public static Node reconstruct(int[] values) { // TODO }
We could use an ArrayList<Integer> for the values, but in this case an array is simple, and gets us what we want.
Next, we know that the first element is the root, so let’s take care of that.
public static Node reconstruct(int[] values) { Node root = new Node(values[0]); root.left = get_left_subtree_somehow(values); root.right = get_right_subtree_somehow(values); return root; }
OK, this is a good next step, but now we’re stuck. How are we going to get the sub-trees? Remember what we said before about each of the sub-tree sections (“L L L”, “R R R”) following the same pattern, recursively? Because of that, we can do the exact same thing, over again:
public static Node reconstruct(int[] values) { Node root = new Node(values[0]); root.left = reconstruct(values); root.right = reconstruct(values); return root; }
Unfortunately, as I’m sure you’ve already seen, this will just recurse infinitely. You need to restrict the method to only work on the part of the array that you care about. For that, we’ll need some new parameters. “start” will be the first index, and “end” will be one after the last index (this is a common pattern in Java – c.f., String.substring()).
public static Node reconstruct(int[] values, int start, int end) { Node root = new Node(values[start]); int left_subtree_start = start + 1; int right_subtree_start = ? // we have to figure out where the right subtree starts root.left = reconstruct(values,left_subtree_start,right_subtree_start); root.right = reconstruct(values,right_subtree_start,end); return root; }
OK, we’re getting closer. Given that we know that everything between indices “start+1” and “end” is a sub-element of values[start], we can just see what’s smaller, and what’s bigger, than values[start].
public static Node reconstruct(int[] values, int start, int end) { Node root = new Node(values[start]); int left_subtree_start = start + 1; int right_subtree_start = left_subtree_start; // find the start of the right sub-tree // also, make sure we don't run off the end while (right_subtree_start < end && values[right_subtree_start] < root.value) { right_subtree_start++; } root.left = reconstruct(values,left_subtree_start,right_subtree_start); root.right = reconstruct(values,right_subtree_start,end); return root; }
OK, looking pretty good! All of our pseudo-code and question marks are gone, and the code looks done. This is the point at which you should go down a mental checklist. First and foremost is, have you covered all the base cases? No, you haven’t. What if there’s nothing in a sub-tree (i.e., start == end)? It’s going to happen eventually, at which point this code will break. So let’s cover that case:
public static Node reconstruct(int[] values, int start, int end) { if (start == end) { return null; } Node root = new Node(values[start]); int left_subtree_start = start + 1; int right_subtree_start = left_subtree_start; while (right_subtree_start < end && values[right_subtree_start] < root.value) { right_subtree_start++; } root.left = reconstruct(values,left_subtree_start,right_subtree_start); root.right = reconstruct(values,right_subtree_start,end); return root; }
The next item on the checklist is “have I answered the question?” Again, the answer is no. You may look at this and think you’re done, but this isn’t the kind of code you would expose to an end-user. They shouldn’t have to specify start and end. So, it’s time to turn our method into a helper method, and use the externally-visible method to set things up (this is a very common practice when doing recursion).
public static Node reconstruct(int[] values) { return _reconstruct(values,0,values.length); } private static Node _reconstruct(int[] values, int start, int end) { if (start == end) { return null; } Node root = new Node(values[start]); int left_subtree_start = start + 1; int right_subtree_start = left_subtree_start; while (right_subtree_start < end && values[right_subtree_start] < root.value) { right_subtree_start++; } root.left = _reconstruct(values,left_subtree_start,right_subtree_start); root.right = _reconstruct(values,right_subtree_start,end); return root; }
Now, finally, it’s time to go through a sample, prove to yourself that the code works, look for anything else you’ve missed, and call it done.
End notes
If you find thinking through these kinds of problems fun and interesting, you should send me your resume! TripAdvisor is growing, and we’re looking for people who love to code in a collaborative, friendly, fun environment.
If you liked this post, you’ll also like:
I think there’s a couple of issues with the solution presented.
Although it solves the problem, its O(n^2) in the case of an unbalanced left-biased tree, due to the while() on each on each call.
Also, an unbalanced left free will cause you to recurse n levels deep which – depending on your environment & language – will probably mean you run out stack space after a few 1,000 or 10,000 calls.
For bonus points, I’d add error checking to make sure its a valid binary tree (and that new doesn’t return NULL)
So I thought about this on the drive home, and came up with an O(n) answer, I think. The key observation I made was that if I am looking at value[n+1], then if it is less than value[n], then it must be the immediate left child of value[n]. If the opposite is true, then it must be a right child of this node or one of its parents. So we get a loop like (where Node(parent, value) is the constructor):
which is pretty striaght forward apart from the get_right_node() function. What is this function? Well, since we’re doing preorder the parent node must already have been deserialized. And we also know that parent->value must be less than the current node. So we get
Does this inner while() loop make deserialize() an O(n^2) op? No, because each node is only considered at most once during the whole process, since we never go back down the tree again. You can even remove the parent node pointer and maintain a (heap based) stack of the current parentage, which at most is O(n) depth for a very unbalanced tree.
(You can get rid of the parent completely and search the partially complete tree each time for an O(n log n) solution),
Now I want to actually test it :)
I fixed the formatting :)
I like this a lot. You forgot to update currentNode in get_right_node, but otherwise it looks solid.
Yup, you’re right, missed that!
Could the solution be simplified even more by reusing or implementing the insert method previously discussed?
If the values are supplied as pre-order then it is simply a case of iterating and adding each.
public static BST reconstruct(int[] values) {
BST tree = new BST();
for (int value : values) {
tree.insert(value);
}
return tree;
}
This will only work for a binary search tree as stated in your question. Reconstructing an arbitrary binary tree would be more complicated.
Looks like it should work.
You have listed the three *depth first* traversals – surely the best way to answer this problem is to say that that you will do a (left biased) *breadth first* traversal, and will return null every time there is a possible branch that isn’t filled.
So a tree like
10
4 17
1 6 19
would return 10,4,17,1,6,null,19
and
10
17
19
would give 10 null 17 null 19
Its now obviously completely trivial to reconstruct the tree in linear time. If I wanted points for style I could use an Optional instead of null.
But if you aren’t allowed to mark empty slots?
You could, for example, mark empty left nodes with int max and empty right nodes with int min. Clearly those values are impossible. If that is ‘cheating’ then the question was not well posed!
Without that information it’s clearly impossible to reconstruct the tree generally unless you mark the empty nodes. Consider a tree of four fives. There are 8 possible trees but all return 5555 – they are degenerate and your answer cannot reliably reconstruct the tree. You either have to mark the empty branches or assume every vales in the tree is unique.
If you assume the values are unique then you can just return the full traversal, I.e. Mark every time you visit a node rather than the first time only. That will again make it very easy to reconstruct the tree. It’s less elegant and has worse space complexity but it will be very fast. E.g.
5
1 9
Would be 5 1 5 9
That again will give you the branching information trivially, though it’s less elegant than your solution.
So, this is a really important point. One of the most common mistakes in an interview is to invent additional restrictions that aren’t in the original problem. MIN_INT and MAX_INT are *not* obviously invalid values. On the contrary, any int value should be acceptable. If your solution requires “sentinal values”, then it’s almost certainly wrong.
Likewise, different people may use a binary search tree for different purposes. You can’t just assume that a BST will contain only unique values. For instance, you could use it to represent a sequence of sorted values. There’s nothing that says a sequence can’t contain duplicate values.
Bottom line: always be careful to avoid implicit assumptions about the problem, and always test your assumptions with the interviewer.
Um int max can never be the *left child* of a node. Int min can never be the *right child*. That’s not a constraint – that’s simply logical. Since I am marking empty nodes I will always know if a left or right child.
I looked up the definition of a BST and duplicate values are prohibited because the inequality is strict.
Good point – this will be true if the tree doesn’t allow dupes. Not a hard and fast rule – depends entirely upon implementation.
Actually, in a bst probably you aren’t allowed duplicates, so if you BFS like I suggested and start inserting you will always build up the correct tree. Each layer divides up the number line, each new node must go in exactly one place in the number line, which only leaves one ‘bin’ for it to go in.
Hi Dan,
Thanks for the post. It surely is an interesting problem. I kind of solved it in the same way as Andrew but my solution uses a stack to store ancestry information. Worst case space and time complexities both are O(n).
C++11 implementation: http://pastebin.com/jpLcKtzx