tayarockstar.blogg.se - Java string to codepoints

However, as I stated above, this only works for characters in the Basic Multilingual Plane (BMP). Now, ColdFusion has two built-in functions for dealing with characters and their CodePoint representations: chr() and asc(). In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, ( \uD800- \uDBFF), the second from the low-surrogates range ( \uDC00- \uDFFF). The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. Characters whose code points are greater than U+FFFF are called supplementary characters. The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). (Refer to the definition of the U+n notation in the Unicode Standard.) The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities.

And, because I will almost certainly mangle any deeper explanation, here's a snippet from Java's Character class:

But, from what I understand, the Unicode standard has evolved over time from a fixed-width, two-byte implementation to a dynamic-width, multi-byte implementation that now allows CodePoints in the range of U+0000 to U+10FFFF. To be clear, I know very little about character-encoding! So, please forgive me if I get anything blatantly wrong here.