Aw string from unicode
Minimum requirements | ||
---|---|---|
Added in version 5.0 | ||
SDK | build 80 |
char* aw_string_from_unicode (WCHAR *str)
Description
Converts a UTF-16 "wide character" (WCHAR) string to a UTF-8 encoded string. For the purposes of the SDK, these wide character strings are also called "Unicode" strings.
Callback
None
Notes
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII.
Each WCHAR will be represented from 1 up to 4 bytes when encoded as UTF-8. For example, the unicode character 0x221E or "∞" (infinity) is encoded as the byte sequence E2 88 9E or "∞".
UTF-8 encoding has two main advantages:
- Unicode strings often contain bytes that are 0x00. This is interpreted by ASCII applications as terminating null-characters. Whereas only the Unicode character 0x0000 (null-character) is encoded as 0x00 in UTF-8.
- ASCII codes 32 - 127 are encoded the same in UTF-8 which means that English will be readable when UTF-8 is used by ASCII applications.
Caution should be exercised when using these Unicode strings across different platforms. Little-endian platforms, such as Intel processors, will store the unicode character "∞" as 0x1E22, while big-endian platforms (such as PowerPC/Motorola) will store it as 0x221E. It may be necessary for your application to have a "byte order mark", if you are not able to encode the string in UTF-8.
Important: The returned pointer is only valid immediately after the call to this method and may be invalidated by subsequent calls. Therefore an application must copy the string to a local buffer if it needs to use the string for an extended period of time.
Arguments
- str
- UTF-16 string to be converted. This is a "wide character" (WCHAR) string where each character is encoded in UTF-16. The maximum length is 4095 characters (i.e. 8192 bytes at most, including the terminating null-character).
Argument attributes
None
Return values
- Pointer to a UTF-8 encoded string
- The maximum length is at least 1023 characters (i.e. 4096 bytes at 4 bytes per character, including the terminating null-character) and at most 4095 characters (i.e. 4096 bytes at 1 bytes per character, including the terminating null-character).
Returned attributes
None
Usage
...