| Unicode code point |
Character |
| U+0041 |
A |
| U+00E9 |
é |
| U+03B8 |
θ (the Greek theta) |
| U+20AC |
€ (the euro) |
| Character |
Unicode code point |
Byte values in file (UCS-2) |
| A |
U+0041 |
0x00, 0x41 |
| é |
U+00E9 |
0x00, 0xE9 |
| θ (theta) |
U+03B8 |
0x03, 0xB8 |
| € (euro) |
U+20AC |
0x20, 0xAC |
char type. This is why C also defines the wchar_t
type, which can hold a 32-bit character (at least in GNU systems). To
avoid both of these disadvantages, UTF-8 was introduced.| Character | Unicode code point | Byte values in file (UTF-8) |
| A | U+0041 | 0x41 |
| é | U+00E9 | 0xC3, 0xA9 |
| θ (theta) | U+03B8 | 0xCE, 0xB8 |
| € (euro) | U+20AC | 0xE2, 0x82, 0xAC |
wchar_t type), although it is a little more difficult to get the length of the string.$Id: encoding.html,v 1.3 2002/01/13 12:20:00 verthezp Exp $
$Name: R0_90_0 $