Encoding types
There are a number of ways to represent a string variable in a binary form. The encoding object you utilize to achieve this functionality depends on the encoding you selected.
Here are the encoding types that developers use most often:
- ASCII: Encodes each character in a string using seven bits. This encoding type cannot contain extended Unicode characters.
- Full Unicode (UTF-16): Represents each character in a string using 16 bits. This results in a byte array that has two bytes for each character.
- UTF-7: Uses seven bits for ordinary ASCII characters and multiple seven-bit pairs for extended characters. This encoding type is most often used with seven-bit protocols such as mail.
- UTF-8: Uses eight bits for ordinary ASCII characters and multiple eight-bit pairs for extended characters. This results in a byte array that has one byte for each character (assuming there are no extended characters).
.NET offers a class for each type of encoding in the System.Text namespace. In order to perform encoding of a string into a byte array, you need to use an appropriate encoding object and then call its GetBytes method. See the example in Listing A.
You can also access a pre-instantiated encoding object through shared properties of the base System.Text.Encoding.UTF8 class. See the example in Listing B.
Note: In .NET, UTF-8 is the preferred standard for encoding since it supports the full range of Unicode characters. Also, it uses an adaptive format that results in a reduced size of the binary data if you are not using extended characters. When encoding ordinary ASCII characters, UTF-8 encoding and ASCII encoding provide the same result. In addition, by default, .NET classes such as StreamReader and StreamWriter use UTF-8 encoding when reading or writing from a stream.
0 comments:
Post a Comment