Aw string to unicode
Minimum requirements | ||
---|---|---|
Added in version 5.0 | ||
SDK | build 80 |
WCHAR* aw_string_to_unicode (char *str)
Description
Converts a UTF-8 encoded string to UTF-16 "wide character" (WCHAR) string. For the purposes of the SDK, these wide character strings are also called "Unicode" strings.
Callback
None
Notes
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII.
Each WCHAR will be represented from 1 up to 4 bytes when encoded as UTF-8. For example, the unicode character 0x221E or "∞" (infinity) is encoded as the byte sequence E2 88 9E or "∞".
UTF-8 encoding has two main advantages:
- Unicode strings often contain bytes that are 0x00. This is interpreted by ASCII applications as terminating null-characters. Whereas only the Unicode character 0x0000 (null-character) is encoded as 0x00 in UTF-8.
- ASCII codes 32 - 127 are encoded the same in UTF-8 which means that English will be readable when UTF-8 is used by ASCII applications.
Caution should be exercised when using these Unicode strings across different platforms. Little-endian platforms, such as Intel processors, will store the unicode character "∞" as 0x1E22, while big-endian platforms (such as PowerPC/Motorola) will store it as 0x221E. It may be necessary for your application to have a "byte order mark", if you are not able to encode the string in UTF-8.
Important: The returned pointer is only valid immediately after the call to this method and may be invalidated by subsequent calls. Therefore an application must copy the string to a local buffer if it needs to use the string for an extended period of time.
Arguments
- str
- UTF-8 encoded string. The maximum length is 2047 characters (i.e. 8189 bytes at most, including the terminating null-character).
Argument attributes
None
Return values
- Pointer to a UTF-16 encoded string
- This is a "wide character" (WCHAR) string where each character is encoded in UTF-16. The maximum length is 2047 characters (i.e. 4096 bytes at most, including the terminating null-character).
Returned attributes
None
Usage
...