When a character is larger than 0xffff, java's String.length cannot output it correctly in at least java6.

In the page "http://en.wikipedia.org/wiki/UTF8" which explains UTF8, there is a character "𤭢". This character is larger than 0xffff in Unicode.

Java's String.length works usually well for Unicode characters, like "好", which has length 1. However, if you try that special character "𤭢" it will return 2. By using String.getBytes(Charset.forName("UTF-8")), we can see that the string is stored correctly and can be exported in someway like getBytes, so it implies that the problem is actually happened in length function.