Thursday, September 17, 2009

Tips N Trix:How to Convert a string into a byte array

Encoding types

There are a number of ways to represent a string variable in a binary form. The encoding object you utilize to achieve this functionality depends on the encoding you selected.

Here are the encoding types that developers use most often:
  • ASCII: Encodes each character in a string using seven bits. This encoding type cannot contain extended Unicode characters.
  • Full Unicode (UTF-16): Represents each character in a string using 16 bits. This results in a byte array that has two bytes for each character.
  • UTF-7: Uses seven bits for ordinary ASCII characters and multiple seven-bit pairs for extended characters. This encoding type is most often used with seven-bit protocols such as mail.
  • UTF-8: Uses eight bits for ordinary ASCII characters and multiple eight-bit pairs for extended characters. This results in a byte array that has one byte for each character (assuming there are no extended characters).
.NET offers a class for each type of encoding in the System.Text namespace. In order to perform encoding of a string into a byte array, you need to use an appropriate encoding object and then call its GetBytes method. See the example in Listing A.
You can also access a pre-instantiated encoding object through shared properties of the base System.Text.Encoding.UTF8 class. See the example in Listing B.
Note: In .NET, UTF-8 is the preferred standard for encoding since it supports the full range of Unicode characters. Also, it uses an adaptive format that results in a reduced size of the binary data if you are not using extended characters. When encoding ordinary ASCII characters, UTF-8 encoding and ASCII encoding provide the same result. In addition, by default, .NET classes such as StreamReader and StreamWriter use UTF-8 encoding when reading or writing from a stream.

0 comments:

Post a Comment