Aw string to unicode

From ActiveWiki
Jump to navigation Jump to search


Minimum requirements
Added in version 5.0
SDKbuild 80


WCHAR* aw_string_to_unicode (char *str)

Description

Converts a UTF-8 encoded string to UTF-16 "wide character" (WCHAR) string. For the purposes of the SDK, these wide character strings are also called "Unicode" strings.

Callback

None

Notes

UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII.

Each WCHAR will be represented from 1 up to 4 bytes when encoded as UTF-8. For example, the unicode character 0x221E or "∞" (infinity) is encoded as the byte sequence E2 88 9E or "∞".

UTF-8 encoding has two main advantages:

  • Unicode strings often contain bytes that are 0x00. This is interpreted by ASCII applications as terminating null-characters. Whereas only the Unicode character 0x0000 (null-character) is encoded as 0x00 in UTF-8.
  • ASCII codes 32 - 127 are encoded the same in UTF-8 which means that English will be readable when UTF-8 is used by ASCII applications.

Caution should be exercised when using these Unicode strings across different platforms. Little-endian platforms, such as Intel processors, will store the unicode character "∞" as 0x1E22, while big-endian platforms (such as PowerPC/Motorola) will store it as 0x221E. It may be necessary for your application to have a "byte order mark", if you are not able to encode the string in UTF-8.

Important: The returned pointer is only valid immediately after the call to this method and may be invalidated by subsequent calls. Therefore an application must copy the string to a local buffer if it needs to use the string for an extended period of time.

Arguments

str
UTF-8 encoded string. The maximum length is 2047 characters (i.e. 8189 bytes at most, including the terminating null-character).

Argument attributes

None

Return values

Pointer to a UTF-16 encoded string
This is a "wide character" (WCHAR) string where each character is encoded in UTF-16. The maximum length is 2047 characters (i.e. 4096 bytes at most, including the terminating null-character).

Returned attributes

None

Usage

...

See also