wiki:UTF8Notes

Version 2 (modified by noz, 8 years ago) (diff)

Created skeleton

UTF-8 and Wide-character handling in Angband

Background

As of the sequence of commits from 589d1d3 to c91ae22, (plus a few bug-fixes and cleanups since) the Angband source is able to handle UTF-8 characters in its edit files, and has dropped the previous hacky mechanism of generating accented characters (with sequences like ["e] for ë).

There are a number of changes that had to happen for this to work, and this page aims to document them.

Locale

Angband now needs to be run within a UTF-8 capable locale, and this is checked in main.c:main(), as:

	if (setlocale(LC_CTYPE, "")) {
		/* Require UTF-8 */
		if (strcmp(nl_langinfo(CODESET), "UTF-8") != 0)
			quit("Angband requires UTF-8 support");
	}

Files

All the edit files are now expected to be in the UTF-8 encoding, and can have accented characters directly inserted in them. Output files such as spoilers, character dumps and other text output is now in UTF-8.

(What about screen dumps?)

Internals

"Canvas"

Textblock

Parsers

In reading the edit files, all strings are maintained in UTF-8 until needed. Glyphs are read in directly to a wchar_t type.

Ports

This section lists port-specific changes and what the individual ports do with the wide-char representation of the display characters to get them onto the display.

SDL

X11

GCU

Windows

OSX

GTK

Android