Long time since I’ve posted on this blog. So, planned to write on a small topic – string interns.
If you are a Java programmer, you already know that strings are immutable. Also, all string literals are interened in Java. That means no literal is duplicated.
String s1 = "satya";
String s2 = "satya";
In the above code snippet, both s1 and s2 have reference to the same object, which is a string in the string table (pool) that JVM maintains.
String s1 = new String("satya");
String s2 = new String("satya");
In the above code snippet, s1 and s2 have reference to different objects because they are not literals.
In the firt case (literals), if you need to compare s1 and s2, just s1==s2 would suffice because both refer to the same object. This is much faster than s1.equals(s2).
What if you want to compare all the strings in your application with == instead of String.equals()?
What you need to do is:
String s1 = new String("satya").intern();
String s2 = new String("satya").intern();
String.intern() would push the string to string table if it’s not already present, otherwise return the existing equivalent string which is in string table. By calling intern() on each and every string that’s created in your application, you can safely use == instead of String.equals() on strings. But is this approach good? Definetely not in all cases.
The obvious dangers with intern() are:
- Each time you call intern(), it has to look up for an equivalent string in string table.
- intern() might be costlier than equals()
- equals() might be infrequently used compared to intern() in your application.
The only case where intern can help you is: when your application has a small set (few hundreds) of highly repeating strings (one string repeated a million times in different objects) and those strings are created in run-time. In this case, you can call intern on all the strings that you create in run time. This would give the reference to a string from the string pool and when you want to compare two strings (with in that set), you can use == instread of .equals()
This strictly depends on how many different strings you are going to intern, for, all interned strings are pushed to string table, and this table is present in permanent generation of JVM. When you start pushing too many strings into permanent generation, this gets full and a full gc is triggered, which is exactly not what you want if your application has a response time SLA.
So, in this case, you would save a lot of memory and can stop multiple strings with same content from getting promoted to tenure generation. And remember a string has a good amount of overhead due to the fields that it maintains. So, if you think “satya” is 5 bytes – you are clean bowled; if you thing “satya” is 10 bytes – you are stumped. Just look at the fields in String class to know what’s the real size. (4 for size, 4 for hash code, 4 for offset, and 16 for array? and 8 or 4 for the object depending on jvm architecture)
So, if you have small set of repeating strings, intern them. (when optimizing for memory)
Hope that helps!