RSS: Articles| Comments| Trackbacks
 

Java Permgen space, String.intern, XML parsing 10

Posted by haakon, Sat, 09 Sep 2006 08:16:00 GMT

This week I have been poking through the innards of a web application trying to find out why we were leaking memory (in the permanent generation) like crazy. After a bit of digging I isolated it down to a line that looked like this:

Document doc = SAXParser.new().parse( stringContainingXML );

My first inclination was to blame the parser. Everyone knows that XML parsers are troublemakers, right? But, in the end I had to conclude the leak was entirely the fault of our code. But I learned a bit along the way! The details:

Permgen space – what is it?

The memory a jvm uses is split up into three “generations”: young (eden), tenured, and permanent. This is done to improve the performance of garbage collection. Most objects are short lived (local variables, etc), and so they come and go in the young generation. Some objects (like things in caches) stick around for a while and get promoted from the young to the tenured generation. Some things live “forever”, like the classes themselves, and “interned” strings. These go straight into the permanent generation.

Most memory leaks involve normal objects, and you run out of heap space by filling up the young and tenured memory spaces. Sometimes though, you might see “java.lang.OutOfMemoryError: PermGen space failure”. The most common cause is that you simply don’t have enough space to load up all your classes. Use the param ‘-XX:MaxPermSize=100m’ to adjust to a desired value. You may also find that doing a hot deploy of a war into tomcat eventually uses up permgen space. That is a different issue which I won’t discuss here.

If you observe that your app is leaking permgen space just while it is running (and not because you are hot deploying a war), then you have an interesting problem. The issue is most likely to be either an errant ClassLoader, or String.intern gone awry. ClassLoaders are an interesting beast, but our problem was with interned strings.

What is String.intern?

String.intern is an optimization feature. Doing a double equals (==) compare of two strings is a common mistake people make, as they forget that this is doing an identity comparison. (a == b) is checking if a and b are in fact the same object. Usually, what you really want to do is check if (a.equals(b)). This does the character by character comparison that you probably want.

The thing is, the latter comparison is much slower than an identity comparison. So, a nice performance optimization can be to maintain a canonical list of strings that allow you to do the fast identity comparisons instead. It would be easy enough to write such a thing for yourself, but it is included in Java these days with the String.intern method . So Java maintains a pool of these “canonical” strings to allow you to get some better performance when dealing with strings. But, this pool lives in the permgen space!

Why not intern all strings?

A natural question might be why one shouldn’t just intern every string. Well, there are two reasons why this wouldn’t work. One, you have finite memory. If you stored every string you ever saw into permgen space with intern, you would run out of memory reasonably quickly. Secondly, the reason you are using intern in the first place is as a performance optimization. It happens to be faster to retrieve the canonical string from the intern string pool than it is to do a character by character string comparison. However, as the intern string pool grows infinitely large, the cost to find your string in the pool would probably eventually become more expensive than to just do the character comparison. So, you only want to intern strings which you use frequently throughout the life of your app.

XML parsers seem to use String.intern (or something similar)

XML parsing just happens to be a whole lot of string parsing. So, it is not surprising to find that they take advantage if intern. But, we just said that you probably don’t want to intern every string you see, so what does a parser like Xerces intern? According to (http://xerces.apache.org/xerces2-j/features.html), “All element names, prefixes, attribute names, namespace URIs, and local names are internalized using the java.lang.String#intern(String):String method”. These are all the strings that are going to be seen repeatedly when parsing multiple xml documents with the same DTD. Notice, that they don’t intern attribute values, and tag contents. These elements are what change from document to document; they are your actual data. To intern these would be to intern your entire data space, and we would be facing the previously mentioned problem of effectively interning all strings.

Our problem

At last we arrive at our problem. We were parsing XML documents and finding that our permgen was steadily growing. At first we just enlarged permgen, assuming we had a lot of classes to load. But when we were blowing up with 500 megs of permgen space used up, it was time to find the problem.

After a bunch of digging, what we found was this. The XML we were parsing was not really XML. It was well formed (tags opened and closed properly, nested properly, etc). But, it was XML for which it would be impossible to write a DTD because the data lived in the tag space. An example will show it best. We had tags that looked like:

<data.6541237895.field1>field one val</data.6541237895.field1>
<data.6541237895.field2>field two val</data.6541237895.field2>
<data.7813329781.field1>field one val</data.7813329781.field1>
<data.7813329781.field2>field two val</data.7813329781.field2>
...

The numbers inside of the tag itself was data! So, there was no limited, finite number of tags that could exist in an XML document of this form. Rather, you could have as many tags as could be represented by a ten digit number. To make it worse, there were different values like “foobar” and “name” and many others for each number. The details are boring, but the important bit was that our tag space was as big as our data space. The XML parser was merrily interning every tag string it saw as a reasonable performance optimization. But, as our XML was not true XML, everything came crashing down.

So how to fix it?

  1. Maybe the best solution would be to fix the bad XML. In this case, we were not the source of the XML so this ideal option was not practical.
  2. Supposedly one can turn off the “feature” of interning via the SAX parser interface in Java. In practice, none of the parsers we tried allowed us to turn it off (let me know if you find one that does!).
  3. It would be nice if the interned strings could just be garbage collected like any other Java memory. I’ve seen conflicting reports on this. This article appears to show that interned strings can be collected.
  4. Don’t use an XML parser if you aren’t really parsing XML.

Number 4 may seem like a copout, but it is the option we landed on. We now use a few regular expressions to pull the data we need from the “XML” document. This happens to both fix our memory problem, and result in a performance improvent. Apparently selecting just the parts of the document we need with a regex is faster than parsing the whole thing with an XML parser.

How to find these problems

Tracking down these problems can be challenging:

  1. Profiling is your friend. Find a good profiler and learn how to use it (JProfiler works nicely).
  2. jmap and jstat are useful tools that come with the jdk. They give you info about memory usage, etc.
  3. visualgc (jvmstat) is a nice tool for seeing an overall picture of your memory usage.
  4. understand how garbage collection works
    (http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html)
  5. get familiar with jvm args that help with this kind of debugging and performance optimizations. Verbose gc logging, tracing of class loading, etc.
    (http://java.sun.com/docs/hotspot/gc1.4.2/faq.html, http://www.brokenbuild.com/blog/2006/08/04/java-jvm-gc-permgen-and-memory-options/)

~haakon

Trackbacks

Use the following link to trackback from your own site:
http://www.thesorensens.org/articles/trackback/15

Comments

Leave a response

Avatar

[...] I got pingback from haakon on a great post entitled Java Permgen space, String.intern, XML parsing. I always love a good debugging session story, when you throw in a memory leak that’s an instant Nerdgasm. This post covers String.intern() , a bit more about Classloaders, and leaves you with a great tools and techniques checklist for solving memory problems in Java. [...]

Wes Maldonado: Data Junkie &raquo; Blog Archive &raquo; Java JVM, PermGen and String.intern(): More tips about dealing with those pesky OutOfMemory errors., 2 days later.
Avatar

well written and informative article on permGen, xml parsing, string interning.

kb, 26 days later.
Avatar

what are you talking about man….? I read until I could read no further and yes it was past the first sentence. Interesting to say the least and informative as well as educational. Now what are you talking about man?...

Steve, 26 days later.
Avatar

I wrote this simple program that interns strings. It seem like the interned string does get garbage collected. You should get a out of memory exception from this program if it did not get garbage collected.

import java.util.*;

public class intern { }

public static void main(String args[])
{
    long counter = 0;
    String junk;
    try
    {
            StringBuffer sb = new StringBuffer();
            while (true)
            {
                    counter++;
                    sb.setLength(0);
                    sb.append(counter);
                    String s = new String(sb.toString());
                    junk =  s.intern();
                    if ((counter % 100000) == 0)
                            System.out.println("current count "+ counter);
}
}
 }
catch (Throwable e)
{
        System.out.println("error in generation count "+counter + " exception "+e);
}
dc, about 1 month later.
Avatar

very informative and enjoyable article.

when i saw the sample “xml” i really cracked. candidate for the “baddest data structure of the year”-award.

oblume, 3 months later.
Avatar

Nice style of writing. Was fun to read and good to know when stumbling into such a programming situation.

Der Klempner

Der Klempner, 3 months later.
Avatar

[...] After reading several resources (#1, #2, #3) we’d determined the cause of our PermGen Problems: [...]

deepthoughts.orsomethinglikethat &raquo; Run JBoss as a Windows Service on BEA&#8217;s JRockit JVM using Java Service Wrapper - not JavaService, 3 months later.
Avatar

Hi, I found your blog via google by accident and have to admit that youve a really interesting blog :-) Just saved your feed in my reader, have a nice day :)

Florian, 4 months later.
Avatar

String.intern is a memory saving feature to avoid having multiple identical Strings. From the Java Api doc:

” Returns a canonical representation for the string object.

A pool of strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are interned. String literals are defined in ยง3.10.5 of the Java Language Specification
Returns:
    a string that has the same contents as this string, but is guaranteed to be from a pool of unique strings."

And String.equals will not do a character by character if two references to a String are pointing at the same String.

And permgen space is not really ‘permanent’. It can be reclaimed – it’s just that the garbage collector does not check it as often as the other generations.

Jimbob, 5 months later.
Avatar

Thanks for sharing your experience. Learned about interning. Never knew that stuff.

Satish, 8 months later.
Comments