Alan Wood’s Unicode resources

Unicode and multilingual support in HTML, fonts, Web browsers and other applications

Animated GIF of various characters

Introduction

Before Unicode became widely supported, it was not uncommon to face problems such as trying to include a passage in a different alphabet in one of your documents, for example a quotation in Russian in an English document, only to find that you had no Cyrillic characters available. Or to send a Spanish document in electronic form to someone in Greece, only to be told that the accented Latin characters had been replaced by Greek characters. Or to produce a Web page that included technical symbols and found that it worked with Windows but not with Mac OS or Unix? Problems like these arose with non-Latin alphabets and Symbol font because in those days most computers used fonts that contained a maximum of 256 characters. The first 128 characters (the ASCII characters) of most fonts included punctuation marks, numbers and the letters a–z and A–Z, and were not a problem. In the USA, Canada, the United Kingdom, the rest of the English-speaking world and much of Western Europe, the second set of 128 characters comprised more punctuation marks, some currency symbols (such as £ and ¥) and a lot of accented letters (such as á, ç, è, ñ, ô and ü). Older English versions of Microsoft Windows, and several other language editions, used this set of 256 characters, which is known as the ANSI character set.

If you lived in a country such as Egypt, Greece, Israel, Russia or Thailand that uses a different alphabet, then your version of Windows used a different character set. The first 128 characters were the same as in ANSI, but many of the places in the second set of 128 were taken by characters from the Arabic, Greek, Hebrew, Cyrillic or Thai alphabets. When documents started to be transferred electronically as e-mail messages, e-mail attachments or Web pages, instead of on paper, reading documents from another country, particularly a country with a different alphabet, became more and more of a problem. There were similar problems when moving documents between operating systems such as DOS, Windows, Mac OS and UNIX.

The solution was to leave behind the assortment of 8-bit fonts with their limit of 256 characters, where the same character number represented a different character in different alphabets, and move to a system that assigns a unique number to each character in each of the major languages of the world. Such a system was developed and is known as Unicode. It is intended for use on all computer systems, not just Windows, and covers Chinese, Japanese and Korean as well as the alphabets for many other languages and scripts, plus a large number of special characters. Some Unicode support has been included in Microsoft Windows since Windows 95, and Windows NT 4, Windows 2000, Windows XP, Windows Vista, Windows 7 and Windows 8 are based on Unicode instead of the ANSI or WGL4 character sets. Some Unicode support has been included in Mac OS since Mac OS 8.5, but prior to Mac OS X 10 only limited use was made of it by applications. Unicode is sometimes referred to as a 16-bit system, which would allow for only 65,536 characters, but this is not correct, and Unicode has the potential to cope with over one million unique characters.

The current version (6.3.0) of the Unicode Standard, developed by the Unicode Consortium, assigns a unique identifier to each of 110,187 graphical, formatting and control characters, covering the scripts of the world’s principal written languages and many mathematical and other symbols. A previous version (2.1) of the Unicode Standard encompassed 38,887 characters and was adopted as part of the recommendations for HTML 4.0.

VersionDateIncreaseCharacters
1.0.0October 19917,161
1.0.1June 199221,19828,359
1.1.0June 19935,62033,979
2.0.0July 19964,97138,950
2.1.2May 1998238,952
3.0.0September 199910,30749,259
3.1.0March 200144,94694,205
3.2.0March 20021,01695,221
4.0.0April 20031,22696,447
4.1.0March 20051,27397,720
5.0.0July 20061,36999,089
5.1.0April 20081,624100,713
5.2.0October 20096,648107,361
6.0.0October 20102,087109,449
6.1.0January 2012732110,181
6.2.0September 20121110,182
6.3.0October 20135110,187

On this Web site, I have tried to gather together practical information about Unicode and the increasing number of applications and fonts that support it, intended to help people who are trying to use Unicode to produce standardised multilingual and technical documents. The pages on the site include:

Test pages for Unicode ranges
Lists of Unicode characters that you can use to test the Unicode support of your Web browser and fonts.
Search for a Unicode character
Search the test pages to find any character that you want to use.
Fonts for each Unicode range
A list of Unicode ranges and the fonts that support them.
Unicode fonts
Lists of fonts for Windows, Mac OS 9, Mac OS X 10 and Unix, with the Unicode ranges they support, and where to obtain them.
Browsers for Apple Macintosh computers
How to enable Unicode support in Web browsers under Mac OS 9.
Internet Explorer for Windows
How to enable Unicode support in IE 4, IE 5, IE 5.5 and IE 6.
Netscape for Windows
How to enable Unicode support in Netscape Communicator 4.x and 6.x.
Editors and word processors
Applications for Windows, Mac OS 9, Mac OS X 10 and Unix that can produce Unicode text, HTML and word processor documents.
File conversion, font and keyboard utilities
Utilities for Mac OS 9, Mac OS X 10, Windows and Unix that can convert files to and from Unicode, view the characters in Unicode fonts, or re-map your keyboard to type Unicode characters.
Creating multilingual Web pages
HTML code, fonts and editors to help you produce Web pages with multiple scripts and languages

For a complete list of pages on the site, please see the Site Map.

Top

Test pages for Unicode character ranges

The pages in the following list can be used to display the ranges of characters defined in the Unicode 6.3.0 Character Database, within the limitations imposed by your Web browser and the fonts that you have installed. There is also a page with a sample of Unicode characters from each range.

General Scripts


Symbols


Miscellaneous

Chinese, Japanese and Korean

Top

Web sites of other Unicode proponents

Alan Flavell
Unicode test material
Andrew Cunningham
Multilingual Unicode web page development
Apple Computer, Inc.
Unicode Utilities
Babel
Towards communicating on the Internet in any language...
Brian Wilson
Text in HTML...
Bruno Haible
The Unicode HOWTO (for Linux)
Christoph Singer
Slavic Text Processing and Typography
Daniel Tobias
Dan's Web Tips: Characters and Fonts
Frank da Cruz
UTF-8 Sampler
Henry Churchyard
Latin 1 and Unicode characters in &ampersand; entities
James Kass
Does Your Browser Support Multi-language?
Jukka Korpela
Using national and special characters in HTML
Markus Kuhn
UTF-8 and Unicode FAQ for Unix/Linux
Michael Everson
Evertype
Microsoft
Global Software Development and Computing Portal
Nelson H. F. Beebe
Fonts for the Unicode Character Set
Oscar van Vlijmen
Unicode browser display
Roman Czyborra
Unicode in the Unix Environment
Sun
Unicode Support in Solaris
Tex Texin
Internationalization (I18n), Localization (L10n), Standards, and Amusements
TITUS
Titus Is Testing Unicode Scriptmanagement
Tom Gewecke
Unleash your Multilingual Mac
Unicode Consortium
Unicode Home Page
Wazu Japan
Gallery of Unicode Fonts

Top

Copyright © 1999–2013 Alan Wood

Page created 3rd February 1999 – Page last updated 24th November 2013

Site last updated 24th November 2013

Send comments or questions to Alan Wood

HTML 4.01 Dublin Core Unicode Encoded

Open Directory