When you initialize a String with the default constructor, the JVM uses uses the Charset.defaultCharset() for the encoding. Anothor constructor allows you to specify any Charset, that is available on your box.
/*
s1 and s2 deliver the same results
*/
String s1 = new String("hello");
String s2 = new String("hello".getBytes(), Charset.defaultCharset());
// specific encoding, and yes you this ü char there with intent
String s3 = new String("grün".getBytes(), "UTF-8");
s1 and s2 deliver the same results
*/
String s1 = new String("hello");
String s2 = new String("hello".getBytes(), Charset.defaultCharset());
// specific encoding, and yes you this ü char there with intent
String s3 = new String("grün".getBytes(), "UTF-8");
Conversion
Charsets have a limited set of characters, that have to be used to encode a, probably much larger amount of charaters. Therefore in UTF-8 the Umlaut ü is encoded as ü. In Java there are serveral ways to convert from one encoding to an other. Let's assume you have an UTF-8 encoded String with the value 'grün' (See the last line of listing 1). To get rid of those escaped charaters like ü you have to encode it with p.e. ISO-8859-1.
System.out.println(new String ( s.getBytes("ISO-8859-1"), "UTF-8"));
As far as I know isn't there any method that can detect the encoding of a String. You can test if a String is UTF-8 encoded, which means, the String only contains characters, that are valid in UTF-8. So it is probably be a valid ISO-8859-* String too.
[unfinished]
Keine Kommentare:
Kommentar veröffentlichen