Blogbody Rotating Header Image

HashMap.get() can cause an infinite loop!

Yes, it is true. HashMap.get() can cause an infinite loop. Everyone I’ve talked to didn’t believe it either, but yet there it is — right in front of my very eyes. Now, before anyone jumps up and shouts that HashMap isn’t synchronized, I want to make it clear that I know that. In fact, here is the paragraph from the JavaDocs:

Note that this implementation is not synchronized. If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map.

Now, I’ve always taken that standard warning in all the Collections classes as simply stating the obvious: these data structures are synchronized and if you aren’t doing your own synchronization you’ll get dirty/unreliable/unknown data and/or behavior.

But never in my wildest dreams would I have thought that without proper synchronization that an infinite loop could occur! Let me explain. First, it helps to take a look at the get() implementation in HashMap. Right off the bat it becomes easy to see how an infinite loop could occur:

public Object get(Object key) {
Object k = maskNull(key);
int hash = hash(k);
int i = indexFor(hash, table.length);
Entry e = table[i];
while (true) {
if (e == null)
return e;
if (e.hash == hash && eq(k, e.key))
return e.value;
e = e.next;
}
}

Just seeing that while(true) block simply shocked me! Looking at it a little further, it seems that it is possible for e.next to point back to e (either directly in a tight loop or indirectly in a longer loop), causing an infinite loop and effectively locking up your thread.

I looked at HashMap a bit more and found the only place where e.next is assigned and could possibly cause this problem:

void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry e = src[j];
if (e != null) {
src[j] = null;
do {
Entry next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
}

This method is called when the internal hash tables are resized. Now, at this point I don’t know how it happens, but somehow the table entries get in a state where an element points back to itself, causing the next get() call to loop forever.

Did anyone else know this? Does anyone else think that this is “OK” or “not OK”? Like I said, I always assumed that the JavaDocs warning was in regards to data consistency — not actual program failure or “lock up”. I’m curious to hear what other people think, as I know I was definitely shocked to learn this.

PS: The original reason I didn’t synchronize the map was because I really didn’t care about data consistency. But I certainly care about avoiding infinite loops, so I’ve since added some synchronization.

29 Comments on “HashMap.get() can cause an infinite loop!”

  1. #1 Captain Obvious (RET)
    on Jul 25th, 2005 at 10:51 am

    So you are complaining that the javadoc is right?

  2. #2 JRB
    on Jul 25th, 2005 at 11:18 am

    I had this problem in the past — or at least something very similar. Once a week our application would crash, with the CPU running at 100% until we killed the VM. In the end I traced it to a hashmap which we were using as a cache. The URL’s for our various pages were supposed to be the key to the cache, but a bug in code was including the jsessionid along with the url — hence the hashmap would just grow and grow in size. However, it never blew out memory — always the CPU. What you describe goes some way to explaining why the CPU was hitting 100% and never recovering.

  3. #3 Patrick Lightbody
    on Jul 25th, 2005 at 11:39 am

    Captain Obvious,
    Thanks for the well thought out comment…

    I wasn’t “complaining” about anything - I was pointing out that my assumption (and that of most other Java developers) may be wrong about why the synchronization is required.

    I would think adding to the JavaDocs a tidbit about why it is needed would go a long way. While technically correct, merely saying “it must be synchronized externally” probably isn’t enough. Most developers associate synchronization with data consistency, not endless loops.

  4. #4 Rob Meyer
    on Jul 25th, 2005 at 12:25 pm

    Most developers would be wrong then; synchronization problems mean that data is not consistent…and loop counters are data (among other important control structures). So if the loop counter (or iterator or whatever) is inconsistent, then loops terminating improperly (or never) is perfectly reasonable unsynchronized behavior.

    Could the docs call it out a little clearly? Maybe, but do they have to put the same dire warning every time they talk about synchronization? That could get a little wordy. “Must” is about as unambiguous as it comes (which is why its found so often in specifications). I’d just consider it a lesson learned, and when the docs say “must”, they don’t mean it’s optional. :-)

  5. #5 Ron
    on Jul 25th, 2005 at 12:54 pm

    yeah, traversing the bucket resulted in frequent infinite loops in an old java 1.4.1_xx due to a bug in hotspot. let me tell you that was a blast.

  6. #6 Patrick Lightbody
    on Jul 25th, 2005 at 2:50 pm

    Rob,
    You’re right that the infite loop is a manifestation of data inconsistency — I just always figured the data inconsistency would happen at the API level, not the internal implementation. You’re right that it was a lesson learned for sure :)

    Though, simply adding in a loop counter and breaking when it reached > the bucket size would go a long way towards catching these. Anyway, I just thought it was interesting. I appreciate the response (as opposed to the snarkiness of Captain Obvious).

    Patrick

  7. #7 Eoin Curran
    on Jul 25th, 2005 at 3:18 pm

    It looks like the map could get in that state if two threads are hitting transfer() at the same time. If they both hit e=src[j] before either one has called src[j]=e then one of the threads could end up seeing the effect of the other calling newTable[i]=e, and hence set e.next=e (this thread could also end up in an infinite loop in transfer() ).

    Anyway, looks like it’s safe for concurrent reads at least.

    eoin.

  8. #8 Brian Goetz
    on Jul 25th, 2005 at 4:24 pm

    > Anyway, looks like it’s safe for concurrent reads at least.

    I haven’t looked at the code you are talking about, but based on experience with hundreds of cases where people have made the same sort of statement, this statement is almost certainly wrong, and reflects several common misconceptions about concurrency, and is probably nearly isomorphic to the erroneous assumption the original poster made of ‘I thought if I didn’t care about stale data, I’d be OK’“.

    Unless you are well versed in concurrency, your intuition about “what could go wrong” in the absence of synchronization (even on read-only paths) is generally incomplete, and reasoning about the order in which things happen is so tricky that you will almost certainly be wrong.

    Whatever you think the “worst case” from not synchronizing is (e.g., stale data), it is almost certainly worse than that.

    The moral of the story: whenever mutable data is shared across threads, if you don’t use synchronization properly (which means using a common lock to guard every access to the shared variables, read or write), your program is broken, and broken in ways you probably can’t even enumerate.

  9. #9 Brian Goetz
    on Jul 25th, 2005 at 4:38 pm

    > While technically correct, merely saying “it
    > must be synchronized externally” probably isn’t
    > enough. Most developers associate
    > synchronization with data consistency, not endless loops.

    But this is sort of like complaining that a warning label on a gun that said “caution, could cause harm to operator” doesn’t adequately warn you of harm to others. (And, if the label warned about harm to others, you could then complain it didn’t warn you about holes in your wall.)

    In general, “inconsistent data” means “your program may crash horribly at any point forward in time.” Lets say you have an ArrayList, which maintains internally an array reference and a size field which indicates the index of the last entry. If the array reference and the size are not consistent with each other, the obvious iteration idiom can throw ArrayIndexOutOfBoundsException, or NullPointerException, among other failure modes. Are such unchecked exceptions when doing somehting so harmless as iterating a List any less bad than infinite loops?

    “Data inconsistency” seems like a pretty dire warning to me. What should the warning say?

  10. #10 Patrick Lightbody
    on Jul 25th, 2005 at 7:39 pm

    Brian,
    No, you’re right — all good points. Don’t get me wrong, I feel very humbled about this whole experience, and I’m glad it happened (this kind of stuff is good to keep one’s ego in check from time to time).

    I’m just saying I always though the danger of the Java Collections API used in an unsynchronized way was that the data coming out of the API would be possibly inconsistent. It never occurred to me that the API methods may just never return :P

    I’m not “complaining” here, I’m merely pointing out that many developers likely had the same assumptions I did. I mean, even those that know unsynchronized == non-deterministic wouldn’t expect that the HashMap get() call could do what it did.

    It wouldn’t hurt to add a check in future versions of HashMap to fail out of this, much like Iterator is fail-fast. It wouldn’t be guaranteed, but it could help with debugging.

  11. #11 Alan Green
    on Jul 25th, 2005 at 7:41 pm

    Good post, and it highlights why the straight-jacket threading restrictions imposed by the EJB spec are such a good idea.

  12. #12 Tim Peierls
    on Jul 27th, 2005 at 12:29 am

    “It wouldn’t hurt to add a check in future versions of HashMap to fail out of this, much like Iterator is fail-fast. It wouldn’t be guaranteed, but it could help with debugging.”

    But some folks might not appreciate having that extra check burned into every HashMap in existence, including those instances that are only accessed from a single thread (for which such checks are a waste of time).

    And don’t be fooled by the name ConcurrentModificationException; it can be thrown in single-threaded code, and is not a reliable way to detect improper synchronization.

  13. #13 Juozas Baliuka
    on Feb 15th, 2006 at 12:25 am

    Source code level “read-only” operation can be implemented in many ways by JVM and “read-only” assumptions are wrong in general.

  14. #14 Anuradha
    on Nov 8th, 2006 at 7:53 am

    Thanks a lot for posting this very useful information in a very clear and explanatory way.
    It helped us to nail down the issue faced : 100% CPU utilization when a ‘get’ was done, of a HashMap object, which was structurally changed as indicated above. In this specific case the HashMap object was re-initialized by one user, while it was accessed by another.

  15. #15 Anuradha
    on Nov 8th, 2006 at 6:57 pm

    Thanks to this very useful information, explained so clearly.

    It helped us nail down the 100% CPU Utilization issue when the HashMap was re-structured.

  16. #16 Don Munro
    on Mar 6th, 2007 at 9:04 am

    Interesting read.

    Readers may find it of interest to note that we are looking at a case where HashSet add() operations are also resulting in an infinite loop within the put() of the the underlying HashMap. To add to the mess, there is concurrent access involved but proper synchronization is in place and no get() operations are ever performed before the loop is encountered.

    A quick look has shown that the hashcode() and equals() implementations for the mapped objects are weak at best. Without having looked hard at the HashMap code I’m working on a theory that this above mentioned failure is resulting in a collision that ends in an infinite loop. Anyone care to set me straight before I put much effort into this?

    Cheers,
    Don

  17. #17 Greg Luck
    on Jun 17th, 2008 at 5:04 pm

    We have also seen this issue in production with some custom code we have. I just diagnosed it today happening on a HashMap.get() which was unsynchronized.

    Thanks for the post Patrick. I agree with your analysis. The next link somehow gets linked back to another element in the list, so that it goes around forever. The most likely time is on resize.

    I also agree with Brian’s comments. Most people think they understand how to write threadsafe code. The only think I know how to do is to write a good concurrency test.

  18. #18 Concurrency and HashMap « Pit Falls in Java Development
    on Jun 29th, 2008 at 2:11 am

    [...] Finally i will leave you with this interesting bug in Java which says HashMap can get into infinite loop when used in muti Threaded Environment which further reiterates my point that not all the possible scenarios can be visualized in a multi threaded environment and therefore one should rely on basics rather than trying a smart creativity which has high probability of landing into trouble at some point in future. You can read details about the infinite loop problem here. [...]

  19. #19 Michael B
    on Jul 1st, 2008 at 1:53 pm

    Synchronisation is about Atomicity, Visibility and Ordering, not just Atomicity alone. It is a honest mistake though.

    http://www.javaspecialists.eu/archive/Issue146.html is a good read.
    http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html is a nice test case to see if you ‘get it’.

  20. #20 Bakuras
    on Jul 2nd, 2008 at 1:34 am

    The fact that “you don’t care about integrity of your data” does not mean that you don’t care about if your program will terminate. The problem is that javadoc states explicitly that the behavior of unsynchronized HashMap is UNPREDICTABLE (aka non deterministic) which also means that your program probably will not terminate at all.

    If HashMap is used correctly you will get correct behavior, since it is tested by millions and written by great programmers. So complaining about wrong usage of means is really pointless. You know fire have many usages …

    If you don’t care about integrity of your data why not to use some random bias if you want simulate non-deterministic behaviors.

  21. #21 Concurrency and HashMap
    on Jul 2nd, 2008 at 5:45 am

    [...] You can read details about the infinite loop problem here. [...]

  22. #22 MAno F.
    on Jul 4th, 2008 at 7:12 am

    I’m just curious, what kind of application did you wrote, if you did not care about data integrity. I think data integrity is one of the basic goals when writing ANY kind of real-world application (well, except of concurency tests & multithreading-learning/teaching purposes).

    As I read the JavaDoc again and again, there was no mention about data integrity. It just says: “if at least one thread makes structural modification, HashMap MUST BE synchronized”. No word about the possible consequences (not only “data integrity”), so this means “ANYTHING COULD HAPPEN, INCLUSIVE THINGS YOU DON’T EXPECT, when you don’t synchronize mutable HashMap accessed from more than one thread”. Inifinite loop is one possible consequence of that.

    Well, even the Collection Framework programmers do not know what everything could happen when you don’t keep their warning in mind, so why do they should write down every possible harm the unsynchronized usage of HashMap (accessed from several threads) could cause?

    For example - another possible damage I see is losing references to some stored objects when reordering the same linked list concurrently in two or more threads (which is the case this time).

    So unless you are just doing harmless experiments, I would suggest YOU ALWAYS SHOULD CARE ABOUT DATA INTEGRITY/CONSISTENCY.

  23. #23 MAno F.
    on Jul 4th, 2008 at 7:21 am

    One more note: trying to avoid the consequences of unsynchronized access to HashMap would certainly result in more time spent in get() method (and maybe other methods too), but the reason the HashMap is not synchronized (unlike Hashtable) is exactly: performance.

  24. #24 Stefan M
    on Jul 23rd, 2008 at 7:27 am

    “Note that this implementation is not synchronized. If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.”

    I atleast would like to know the why, before someone tells me “You must”. Im grown up and dont want to be treated like a child by these “great programmers”. Im grateful to the author for pointing out the specific problem and hope that all these top down arguers will run into a similar problem themselves.

    @MAno “I think data integrity is one of the basic goals when writing ANY kind of real-world application” thats your opinion and you may want to consider that there are other valid opinions out there than just yours.

    I agree fully with the author who is not blaming anyone, that the JavaDoc COULD be more verbose than a simple: “you must”.

  25. #25 Mark L
    on Aug 18th, 2008 at 8:41 pm

    I too assumed that the data was safe, much to my peril!

    I am now converting to ConcurrentHashMap. I am assuming this will resolve the issue - has anybody found this not to be the case?

  26. #26 Maninder Batth
    on Aug 29th, 2008 at 6:56 am

    Patrick,
    All i can say is i am speechless. I made exact same assumptions as you because, in my application too, the risk of data corruption was managable and easily rectified, hence i choose not to synchronize hashmap and the remainder is the history.
    BTW, can you please suggest a cure… I still do not want to synchronize, atleast not externally.

  27. #27 plightbo
    on Aug 29th, 2008 at 6:58 am

    Maninder,
    You’ll have to synchronize it. It shouldn’t really cause much of a hit, so just go for it. Or, use the new java.util.concurrent collections and they’ll manage it for you. Good luck!

  28. #28 Mark L
    on Aug 29th, 2008 at 12:41 pm

    I did convert to ConcurrentHashMap which is in new java.util.concurrent collections. I performed a few tests to check for memory and performance differences of both HashMap and ConcurrentHashMap and they appeared to me to perform roughly about the same.

    ConcurrentHashMap sure has solved my problem with this.

    Thanks for your original post all them years ago Patrick.

  29. #29 Brian Egge
    on Oct 2nd, 2008 at 6:25 pm

    A similar, but no less fun, loop can occur within DateFormat. Concurrent use generally leads to garbage in your formatted dates, but can upon occasion cause the DateFormat to get caught in a loop.

    If multiple threads access a formatter concurrently, it must be synchronized externally.

    I feel the ‘must’ in the JavaDoc is a pretty good warning to say your worst nightmares may come true if you break the rule. Because the implementation of a class is allowed to change, the JavaDoc shouldn’t have to specify exactly what the worst case scenario is. In fact, if you haven’t designed a class with concurrency in mind, it can be really hard to anticipate what the worst case scenario is.

Leave a Comment