UTF-8 Howto for GNU/Linux



Introduction

This page will try to explain how one should get GNU/Linux working properly with UTF-8 support. Note: this Howto is solely focused on getting your LC_CTYPE to use UTF-8 and does not cover internal programs support for UTF-8.

More information on what UTF-8 is can be found here: http://www.cl.cam.ac.uk/~mgk25/unicode.html

Why would I want UTF-8?

UTF-8 will enable you to read/type characters in all kinds of languages as long as your font supports it.

A small example:

  1. Hello world

  1. H€llô wörld

  2. Ηελλο ςορλδ

If you are on UTF-8 or ISO-8859-1+ you will be able to read those 3 lines.

If you can't and you would like to this document will explain it.

If you can read it but would like to be able to produce those characters yourself and have other people see them correctly, this document will explain that as well.



Getting started

Debian

First of I'd like to tackle the debian user base, they should use their internal mechanism to get UTF-8 up and running:

root@debianbox # dpkg-reconfigure locales <enter>

Just follow the instructions and UTF-8 will be available for you on Debian.

Other distributions

The first thing to do is have your kernel support UTF-8. This document is meant for users who can at least compile a kernel on GNU/Linux so that is a prerequisite.

Run make menuconfig or xconfig or whatever you feel fits your needs and enable: “NLS UTF-8” support which can be found under: File Systems/Native Language Support. I suggest you set this code to Y as it will ensure you it's available. However enabling it as M might work as well, I have not tested it.

Compile your kernel and get it up and running.

Next you should create a new locale that supports UTF-8, here's how:

root@gentoobox # localedef -i en_US -f UTF-8 en_US.UTF-8 <enter>

Note: en_US is used here but it can be your local language as well, i.e. nl_NL or es_ES or whatever you want: you can check what locale is already using right now by doing:

root@gentoobox # locale | grep CTYPE <enter>

Should this command return “POSIX” then I'd suggest you simply go with en_US.

once that's done you are all set to enable UTF-8 by default.



Enabling LC_CTYPE

This is very distro specific and that's why I'll cover them separately.

Gentoo

edit “/etc/env.d/02locale” and add this to that file: LC_CTYPE=”en_US.UTF-8”

If 02locale does not exist just create it.

Now run: env-update; source /etc/profile

Slackware

edit: “/etc/profile” and look for LC_ALL, comment LC_ALL out and add: export LC_CTYPE=”en_US.UTF-8”

Mandrake

edit: “/etc/sysconfig/i18n” and change the line: 'LC_CTYPE=en_US' to 'LC_CTYPE=en_US.UTF-8'

SuSE

edit: “/etc/profile.local” and change/add this line (to): 'LC_CTYPE=en_US.UTF-8'

and add: 'export LC_CTYPE' if it's not present in that file.

Other distros

RedHat 8 and higher sets it locale to UTF-8 by default, so there's not much to be done. As for others, I don't know, if you figure out how let me know and I will update this howto.



Wrapping up

Remember that in all cases en_US should be replaced by the locale you desire.

Be sure that any conflicting locale settings are disabled, like LC_ALL will spoil your LC_CTYPE. Furthermore make sure that /etc/profile has no conflicting ones either.

After reboot you should be running UTF-8, you can check this by doing:

root@gentoobox # locale charmap <enter>

This should return: “UTF-8”





© Laurens Buhler 2003