Aw string from unicode

From ActiveWiki
Revision as of 21:46, 18 November 2009 by Dr. Squailboont (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Minimum requirements
Added in version 5.0
SDKbuild 80


char* aw_string_from_unicode (WCHAR *str)

Description

Converts a UTF-16 "wide character" (WCHAR) string to a UTF-8 encoded string. For the purposes of the SDK, these wide character strings are also called "Unicode" strings.

Callback

None

Notes

UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII.

Each WCHAR will be represented from 1 up to 4 bytes when encoded as UTF-8. For example, the unicode character 0x221E or "∞" (infinity) is encoded as the byte sequence E2 88 9E or "∞".

UTF-8 encoding has two main advantages:

  • Unicode strings often contain bytes that are 0x00. This is interpreted by ASCII applications as terminating null-characters. Whereas only the Unicode character 0x0000 (null-character) is encoded as 0x00 in UTF-8.
  • ASCII codes 32 - 127 are encoded the same in UTF-8 which means that English will be readable when UTF-8 is used by ASCII applications.

Caution should be exercised when using these Unicode strings across different platforms. Little-endian platforms, such as Intel processors, will store the unicode character "∞" as 0x1E22, while big-endian platforms (such as PowerPC/Motorola) will store it as 0x221E. It may be necessary for your application to have a "byte order mark", if you are not able to encode the string in UTF-8.

Important: The returned pointer is only valid immediately after the call to this method and may be invalidated by subsequent calls. Therefore an application must copy the string to a local buffer if it needs to use the string for an extended period of time.

Arguments

str
UTF-16 string to be converted. This is a "wide character" (WCHAR) string where each character is encoded in UTF-16. The maximum length is 4095 characters (i.e. 8192 bytes at most, including the terminating null-character).

Argument attributes

None

Return values

Pointer to a UTF-8 encoded string
The maximum length is at least 1023 characters (i.e. 4096 bytes at 4 bytes per character, including the terminating null-character) and at most 4095 characters (i.e. 4096 bytes at 1 bytes per character, including the terminating null-character).

Returned attributes

None

Usage

...

See also