Java quirks and interview gotchas

Interviewers are a diverse lot. Some care about this, others about that, each has her own set of biases, and short of being perfect, there’s really no way to please everyone. The worst is when you’re doing well, then get hung up on an obscure language feature that the interviewer decides is make-or-break. This says more about the interviewer than you, but it can easily cost you an offer if you blank or aren’t prepared.

So, as a public service announcement, and in the interests of moving the conversation past some of the annoying gotcha questions, here’s a grab bag of things you should know about Java – some more important, some less so, some just plain annoying. But, well, interviews.

  • StringBuilder

Seriously, this is one of the foundational classes you use all the time, and yet I frequently run across candidates who’ve never run across it. College students, in particular, since string manipulation presumably doesn’t come up in their classes. It’s a critically important class, though, which can be demonstrated by the following code snippets:

// Concatenation without StringBuilder
String result = "";
for (int i = 0; i < 100; i++)
  result = result + i;

// Concatenation with StringBuilder
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++)
  sb.append(i);
String result = sb.toString();

In the first example, you’re creating a new String and copying the values every time you concatenate result and i. This works, except it turns this into an O(n2) operation. In the latter case, you append to a char buffer each time through the loop (O(n)), only creating the result String at the end.

Caveat 1: This is such a common anti-pattern that for simple cases, the compiler is optimized to create a StringBuilder for you behind the scenes. You can see it happen if you step through your code in the debugger.

Caveat 2: If you don’t specify the initial size of the StringBuilder (as I fail to do above), then it will probably have to resize (and copy) its buffer multiple times.

Bonus history lesson: StringBuilder was a drop-in replacement for StringBuffer introduced in Java 1.5. The two are identical, with the exception that StringBuffer synchronizes all operations (which is slower, and usually unnecessary).

  • String.substring()

This is one of those things that seems horribly pedantic, terribly unfair, yet is kind of interesting, and important for you to know (both because you should understand what’s going on under the covers, and because an unusually anal interviewer might spring it on you). I.e., as of Java 7u6, the behavior and performance characteristics of the String.substring() method changed. In the old days, String contained four fields: char[], offset, length, and hash. The idea was that multiple String objects could point to the same char[], but have different offset, length, and hash fields. This made substring() into an O(1) operation, but had a couple of problems:

  • Memory leaks. Consider the following code:
    String pi_to_a_million_digits = "3.14159265358979323846264...";
    String pi_approx = pi_to_a_million_digits.substring(0,4);
    pi_to_a_million_digits = null;

    You allocate a really big String, take a very small piece of it, and throw away the original String. Now you’re carrying around the full char[], but only using four characters.

  • Serialization. Consider what happens in the above example when you try to serialize pi_approx. You only want four characters, but you end up serializing the million character char[].

Taken together, these are why you sometimes see the following in legacy Java code, with the goal of forcing creation of a new char[] for the new String:

String pi_approx = new String(pi_to_a_million_digits.substring(0,4));

As of 7u6, substring() creates a copy of the sub-string – this fixes the above problems, and removes the need for offset and length. The good news is that this now works more intuitively, and substring() no longer has a weird set of side effects. The bad news is that substring() is now less performant, and frequently takes more memory (because you’re keeping multiple copies of identical data).

  • String.intern() considered harmful?

String.intern() is a quirky little method that’s caused a lot of trouble over the years. When “interned”, common Strings are stored internally, then re-used instead of being allocated for each use. So, if you’re reading an address from a database:

String state = resultSet.getString(1).intern();
String country = resultSet.getString(2).intern();

If the state and country values have already been stored internally, then pointers to the previously allocated Strings are returned (and the Strings returned by resultSet are garbage collected). If they don’t exist yet, the new Strings are stored and returned. This saves on memory in cases where the same Strings are going to be used over and over again, and speeds comparisons a little (even so, don’t use ==, use equals(), which will check for pointer equality anyway).

Prior to Java 7, intern() got a bad rap because it put Strings into PermGen – the area of memory that doesn’t get garbage collected. Unfortunately, PermGen is usually pretty small when compared to the total heap allocation, so it was easy to hit an OutOfMemoryError when you still had plenty of heap.

As of Java 7, interned Strings are stored in the heap, so this is no longer an issue. However, there’s still a performance penalty for calling intern(), so you shouldn’t unless you have a good reason to, and know what you’re doing.

  • Double-checked locking

The following used to be a common pattern for implementing lazy initialization:

class Foo { 
  private Helper helper = null;
  public Helper getHelper() {
    if (helper == null) {
      synchronized(this) {
        if (helper == null) 
          helper = new Helper();
      }
    }
    return helper;
  }
...
}

Alternatively, the static case, frequently used for singletons:

class Foo { 
  private static Foo instance = null;
  public static final Foo getInstance() {
    if (instance == null) {
      synchronized(Foo.class) {
        if (instance == null) 
          instance = new Foo();
      }
    }
    return instance;
  }
...
}

This makes logical sense, and if you didn’t know better, you might even have come up with this idiom on your own (ahem). There’s a pretty amusing article on why this fails non-deterministically in all sorts of creative ways. You should know the fixes, both so that you can use them as necessary, and understand why they’re in someone else’s code:

  • volatile. By specifying the helper variable as volatile (i.e., saying that it can be altered by something external to the Java thread), you force the code to look up its value in memory before using it.
    class Foo { 
      private volatile Helper helper = null;
    ...
    
  • SingletonHolder. Because of the way static initialization works (see below), using a private static internal class as a holder for a singleton instance enforces good behavior on the part of the JVM.
    class Foo {
      private final static class FooHolder {
        private static final Foo instance = new Foo();
      }
      public Foo getInstance() {
        return FooHolder.instance;
      }
    ...
    }
    
  • Static class initialization

Static fields in a class aren’t initialized until the class is referenced for the first time, which could happen when a static field is referenced, an object is instantiated, a static method is called, etc. The details of how this works can be found here, but one key point to keep in mind is that the JVM synchronizes on this initialization, which is what allows the SingletonHolder pattern described above to work. Unfortunately, it also means that classes with time-consuming initialization can block your main thread. Sometimes, the best strategy is to trivially reference the class at start-up time to trigger static initialization.

  • HashMap

HashMaps are such a normal part of web development that it’s easy to forget that they aren’t used much in other domains. Video game development, for instance, doesn’t tend to use them that much (though of course that depends on the game). They’re absolutely essential data structures to know, however, so you should just get up close and personal with them. Know what a hash function is, the difference between a Map and a Set, and (because some interviewers are pedantic) the difference between HashMap and Hashtable (HashMap is unsynchronized – faster! – and permits null values – a frequent source of bugs). Be able to explain the underlying details of how key/value pairs get stored (i.e., the key gets hashed to an int, which is mod’d to the size of the array and used as an index, then the key/value pair is stored in a linked list pointed to from the indexed cell in the array), and why you have to store the key along with the value (i.e., multiple values may be stored in a particular bucket, so you need to be able to differentiate between them).

Caveat: there are multiple ways of resolving collisions in HashMapsjava.util.HashMap uses “chaining”, described above. “Open addressing” is an alternate method for resolving index collisions. You should know both.

Bonus points: it’s also useful to know the LinkedHashMap class (and how it works), since that can sometimes short-circuit an otherwise difficult problem.

Advanced topics

Garbage collection, concurrency, and Java 8 lambda expressions are key topics that you should understand at at least a basic level, but which deserve more space than I can devote to them in this post. I’ll try to do some basic primers (nothing fancy) soon…

If you liked this…

I’ve written a lot on interviewing, and you might want to check out some of the following articles:

If you’re a Java programmer, then you should absolutely read Effective Java by Josh Bloch. Every language has its quirks, best practices, and idioms, and this book is hands down the best way I’ve found to move beyond the awkward novice period in which you know just enough to be dangerous.

Let me know if you’ve run into any other obscure details – somehow, it’s the fiddly bits that are the most interesting, and cause the most trouble.

Updated: changed “stop the world event” to “block your main thread”. Thanks ldan for the comment!

Enter your email address to follow this blog and receive notifications of new posts by email.

16 thoughts on “Java quirks and interview gotchas

  1. I’ve hardly written any Java, but how is the first one an O(n^2) operation? Surely the runtime is just doing something like this each iteration:

    * allocate string for i, write i’s characters to this buffer
    * allocate string for “result + new string of i”

    Which means it O(n) ? Inefficient, sure, but still O(n)?

  2. BTW, I tried to connect to comments via Facebook, and Chrome says

    “Your connection is not private

    Attackers might be trying to steal your information from dandreamsofcoding.com (for example, passwords, messages, or credit cards).”

  3. Love the insight on this post! Do you know how we would learn this on our own? Any particular pages? I guess books?

    • Seriously, the best place to start with Java is Effective Java. This was named after a great set of books for C++ called Effective C++ (1 and 2). The point is that you may know that language constructs exist, but not know when, how, or why to use them (or not). We typically give a copy of Effective Java to all of our interns and new engineers, then run them through a weekly reading group. Next, you read blogs – there are a lot of people out there who are going deep, looking for these types of things. You read release notes. Lastly, you code, a lot :)

  4. I like to ask the “Is Java pass-by-value or pass-by-reference” question as my last question and 50% of the time I get the answer – “It’s pass by value for primitives and pass-by-reference for objects”. So then I ask them to write me a swap method for two objects – and at that point they realize something is up! :-)

    • Java is pass by value but you DO pass the reference for non primitives.. In that sense, java is pass by copy but only the reference to the object. You can also swap 2 objects with a wrapper pattern. Your question may be misleading.

      • You are right it can be a bit confusing that’s why it’s my last question and a bonus question for those that are doing well. It’s not a decider but could push me from being a positive to a “raving positive” :-) I have hired folks who got it wrong but not many.

    • Frankly, it is an awful bit of trivia. Yes, we all know what they _mean_ when they say that Java is “exclusively pass-by-value”, but it is a dishonest way to express what is really happening. By Java’s logic, pass-by-reference simply does not exist. Passing an int& in C++ merely passes the address of the int… by value.

      So, sure, Java cannot pass variables by reference… but passing a List or Map to a function lets that function edit the content of that collection. Any sane developer would acknowledge that the object was passed by reference.

      “I didn’t put the bullet inside him, officer! My gun did!”

      • I think the key is how the question is expressed – it’s important to understand how Java passes parameters (both primitives and pointers), and you have to make sure you phrase your question in such a way as to make clear what you’re asking (there are some questions I’ve asked so many times, and seen so many people misinterpret, that I always stick with the exact same phrasing). As I discuss in an earlier post, there are many, many candidates who update pointers in a method and don’t understand why they don’t stay changed after they return to the parent. But you have to be careful, since “pass by reference” can be confusing (you’re passing the object by reference, but the pointer to the object by value – <sigh>).

  5. Few points:
    StringBuilder and Concatenation: there is no real world performance impact: http://stackoverflow.com/questions/15397515/stringbuilder-vs-concat-vs-operator-relative-performance-different-in-ecl

    volatile != double checked locking and SingletonHelper != double checked locking. both != Singleton. OP mixed up patterns to solve a concurrency problem. there is no downside in using pattern X per se.

    Static class initialization: “Unfortunately, it also means that classes with time-consuming initialization can cause stop-the-world events.”

    stop-the-world events are related to JVM and more accurately the JVM GC. I can do static { sleep(10000); } and just create custom classloader in a different thread.

    • Thanks for the comments! I actually addressed the first issue regarding concatenation in one of the caveats. The java compiler will do its best to turn this into StringBuilder, but it doesn’t require much complexity for it to decide that it doesn’t know what to do. I partially put this in to prep people for the next blog post (which just went up :).

      Regarding volatile and SingletonHelper: I’m not sure I understand what you mean. Double-checked locking is an actual problem (c.f. the link included). Using volatile to fix it was the original suggestion. And using the inner private final static class is the preferred method for static variables of this type.

      “stop the world events” – absolutely right. This is almost exclusively used when referring to GC, which this is not. As you said elsewhere, it would have been preferable to say “this can lock up the main thread.”

      • I’ll read the next blog post too. Thanks for clarification.

        Using volatile does not provide syntonization as far as I’m aware. It tells the JVM to skip the cache. It’s not race-condition proof..

        Double checked locking can be useful in many cases. For singleton case, sure, you can avoid it and use static final.. but you suggested DCL is anti-pattern. I don’t agree. I could fabricate a case that initialize more than one object that DCL is perfect for.

        If you asked me to write a singleton, I’d maybe start with an eager implementation (which you didn’t include in your post), moving to an Holder for lazy one.. I don’t think it says something about DCL.

  6. Do people still ask about singletons in interviews these days? I thought it was now pretty much considered an antipattern making unit testing difficult and violating inversion of control principals.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s