Alan Wood’s Unicode ResourcesUnicode and Multilingual Editors and Word Processors for Mac OS XIntroductionMac OS X 10 did not originally include support for as many languages and scripts as Mac OS 9. Mac OS X 10.1 supported Central European, Cyrillic and Japanese, and Korean, Simplified Chinese and Traditional Chinese were made available as downloads. Mac OS X 10.2 introduced support for Arabic, Devanagari, Greek, Gujarati, Gurmukhi, Hebrew and Thai scripts. Mac OS X 10.3 introduced support for Armenian, Unified Canadian Aboriginal Syllabics and Cherokee scripts. The Editors listed below are those that are available in versions designed for Mac OS X; other editors that are designed for Mac OS 9 can be used in Classic mode. BBEditBBEdit is a text editor for OS X 10.3.9 or later that includes extensive support for producing HTML files and program code, as well as plain text files. It can edit text in several left-to-right languages and scripts, including double-byte scripts, and it supports the Mac’s Unicode keyboards. Older versions allowed only one font to be active at a time, and so only one non-Latin script plus unaccented Latin characters could be displayed properly simultaneously, but it can now display multiple scripts simultaneously. It can use any installed Web browser for WYSIWYG preview. Files that contain multiple scripts can be opened and saved with UTF-8 or UTF-16 character encoding.
HTML tags and attributes can be typed directly, or selected from a floating palette or a menu, and are shown in user-selectable colours. BBEdit includes an HTML syntax checker, and a link checker for links within your site. It is produced by Bare Bones Software, Inc. and costs US $199.00 plus shipping. A trial copy that can be used for 30 days is available. jEditjEdit is a Unicode text editor that is written in Java and can run under Mac OS X, Linux and Windows. It can be used with any text file, but is intended for editing programming and markup languages, and has syntax colouring for over 60 of these, including HTML and XML. jEdit can open and save files with any encoding that is supported by Java, including UTF-8 and UTF-16. It can use any of the normal Mac OS X keyboards, but not the Unicode Hex Input keyboard.
For multi-script documents, it is convenient to use a large Unicode font such as Arial Unicode MS. To change the default font:
jEdit is produced by Slava Pestov and is freeware. For more information and to download the software, visit the jEdit - Open Source programmer's text editor Web site. MellelMellel is a Unicode-aware word processor that is designed for Mac OS X and supports many scripts and languages including Latin, Cyrillic, Greek, Arabic, Farsi, Hebrew, Chinese, Japanese and Korean. In addition to its native format, it can import and export RTF files (including multi-script files from Word for Windows) and plain text files with Mac, Windows and ISO encodings. It can use the normal Mac OS X keyboards and the Unicode Hex Input keyboard.
The program is still being developed, and future plans include HTML import and export. Mellel is produced by RedleX and costs US $39; a free trial version is available. More information and downloads are available from the Welcome to RedleX - Creators of Mellel Web site. The optional downloads include Arabic and Hebrew keyboards and Persian fonts. Mozilla ComposerThe Composer component of Mozilla is a multilingual HTML editor that supports Unicode and can edit files in WYSIWYG, WYSIWYG plus tags and plain HTML modes. It supports Apple’s Unicode Hex Input and Extended Roman keyboards. Mozilla Composer can produce files that include multiple scripts and languages, and it can save HTML files with UTF-8 character encoding. By default, Mozilla Composer re-formats your HTML code to conform to its idea of good style. To turn off this option, so that HTML formatting is left alone:
Available only as part of Mozilla, which includes the Mozilla Navigator Web browser and can be downloaded free of charge from http://www.mozilla.org/releases/. Netscape Composer 6.2The Composer component of Netscape 6.2 is a multilingual HTML editor that supports Unicode and can edit files in WYSIWYG, WYSIWYG plus tags and plain HTML modes. It does not yet support Apple’s Unicode Hex Input and Extended Roman keyboards. Composer 6.2 can produce files that include multiple scripts and languages, and it can save HTML files with UTF-8 character encoding. By default, Netscape Composer re-formats your HTML code to conform to its idea of good style. To turn off this option, so that HTML formatting is left alone:
Available only as part of Netscape 6.2, which includes Netscape Navigator and can be downloaded free of charge from Netscape 6 Release. Nisus Writer ExpressNisus Writer Express is a word processor for Mac OS X 10.3 or later. Its preferred file format is Rich Text Format (RTF), but it can also open and save as Rich Text Format Directory (RTFD), Microsoft Word, WordPerfect, AbiWord and HTML. It can open and save text files in UTF-8, UTF-16 and several other encodings. It supports all of the keyboards for left-to-right scripts, the IMEs for CJK, and Apple’s Unicode Hex Input keyboard driver, which allows you to enter any Unicode character by holding down the Options key while typing the 4-character hexadecimal character reference, e.g. 0E05 for the Thai character kho khon. From version 2.5, it supports editing of Arabic and Hebrew.
Nisus Writer Express can open and save multi-script files produced by Word for Mac, but it has problems opening multi-script Word for Windows files. Nisus Writer Express is a commercial application; more information is available from Nisus Writer Express. A 30-day trial version is available from Nisus Writer Express Download. PepperPepper is a text editor for Macintosh computers that runs under both Mac OS 9 and Mac OS X 10, and can make use of the Unicode support that has been built into Mac OS starting with version 8.5. It can therefore use Apple’s Unicode Hex Input keyboard driver, which allows you to enter any Unicode character by holding down the Options key while typing the 4-character hexadecimal character reference, e.g. 0E05 for the Thai character kho khon. Pepper has the unusual ability for a text editor to display scripts for which Language Kits are installed in appropriate fonts; the mapping can be changed in the FontMapping section of the Preferences dialog box. Alternatively, it can use a single multi-script font; this option is turned on by selecting "ATSUI text rendering" in the Editing section of the Preferences dialog box.
Pepper can import and export files in UTF-8, UTF-16 (big and little endian), ANSI, MacRoman, ShiftJIS, Big5 and all of the ISO 8859 character sets. Pepper has syntax styling for HTML and several programming languages. It has a few aids to producing HTML files, but you have to type in most HTML tags, and it can save files with UTF-8 character encoding in order to produce multilingual Web pages. Pepper used to be shareware but is now a commercial application. It is available from Digital Wandering and costs US $35.00. Simredo 3Simredo 3.31 is a freeware Unicode text editor, written in Java, that runs under various operating systems, including Mac OS X. Its default format is UTF-16, it can convert over 100 encodings to and from Unicode (UTF-8 and big and little endian UTF-16), and it includes support for Esperanto and right-to-left scripts. Simredo can also re-map the keyboard, to allow typing in unusual scripts, and has a character map (but the facility to copy selected characters to the document does not work in Mac OS X). It has no direct support for HTML, but you can type in HTML tags, or copy and paste all or part of the contents into an existing HTML document in another editor. It supports Apple’s Unicode Hex Input and Extended Roman keyboards.
More information about Simredo and a free download are available from the Simredo 3.3 - Java Unicode Editor Web site. StyleStyle is a shareware text editor that can read and write formats including Rich Text Format (RTF) and Unicode (UTF-16). For editing multiple languages and scripts, it uses Apple’s proprietary character sets, and converts to and from Unicode when documents are saved or opened. It does not support Apple’s Unicode Hex Input and Extended Roman keyboards. It includes an AppleScript for generating HTML files with UTF-8 encoding.
More information about Style and a trial download are available from the Welcome to Style! Web site. Style is shareware, and registration costs US $12.00. SUESUE (Simple Unicode Editor) is an experimental Unicode editor that makes use of the Unicode support that is built into Mac OS X. This means that it can make full use of the large Unicode fonts that are designed for Windows, such as Arial Unicode MS and Bitstream CyberBit, as well as the Unicode fonts supplied with Mac OS X. It can also use Apple’s Unicode Hex Input keyboard driver, which allows you to enter any Unicode character by holding down the Options key while typing the 4-character hexadecimal character reference, e.g. 0E05 for the Thai character kho khon.
SUE can import and export files in a variety of Macintosh, ISO, Windows and DOS code pages as well as UTF-7, UTF-8 and 16-bit Unicode. It can save files as text, Unicode or Textension (the format of Apple’s Multilingual Text Editor technology). SUE has no direct support for HTML, but you can type in HTML tags and save your file with UTF-8 character encoding in order to produce multilingual Web pages. SUE is written by Tomasz Kukielka and is available from the SUE Web page. TextEditTextEdit is an editor for formatted text that uses RTF (Rich Text Format) as its native format. It can also open and save plain text files in UTF-8, UTF-16, Western (Mac and Windows), Japanese, Korean, Simplified Chinese and Traditional Chinese. It supports Apple’s Extended Roman and Unicode Hex Input keyboards.
TextEdit is supplied with Mac OS X 10.1, and is installed as part of a default installation. ThinkFree WriteThinkFree Write is a Java-based word processor that can read and write RTF (Rich Text Format) files, Microsoft Word files (including multi-script files produced with Word for Windows) and HTML files with UTF-8 encoding. It supports Apple’s Extended Roman, Unicode Hex Input and other Unicode keyboards.
ThinkFree Write is part of ThinkFree Office. The suite costs US $49.95, but a fully-functional 30-day trial version is available. Word:mac v. XMicrosoft’s Word:mac v. X word processor for Mac OS X 10.1 uses the same file format as Word 97, Word 2000 and Word 2002 for Windows, but cannot read multi-script documents from Word for Windows. Multiple scripts are retained if native Word:mac v. X documents are transferred to Word 97, Word 2000 or Word 2002, and when Unicode (UTF-16) text and HTML (UTF-8) pages are produced. Word:mac v. X does not support Apple’s Extended Roman, Unicode Hex Input or Vietnamese Unicode keyboards. It can see Windows Unicode fonts, but it can only use them for Latin script. Word:mac v. X has a dialog box for picking characters from large fonts, accessed from Symbol... on the Insert menu. However, it does not show Unicode ranges and it does not work with all fonts (e.g. Lucida Grande contains Latin Extended-A, Latin Extended Additional, Greek and Cyrillic characters, but only MacRoman characters are shown). Word:mac v. X can be used as a WYSIWYG HTML editor for producing multi-script Web pages. To save an existing Word document as a Web page:
To set UTF-8 as the default encoding for all HTML files produced in Word:
To create a multi-script Web page that is not based on an existing Word document:
The Formating Palette allows you to format your text. To type in other scripts or languages, select the appropriate keyboard and if necessary also choose an appropriate font. If you need to work directly on the HTML code, open the View menu and select HTML Source. To revert to the normal WYSIWYG view, open the View menu and select Exit HTML Source. Word 2004Microsoft’s Word 2004 word processor for Mac OS X 10.2.8 onwards uses the same file format as Word 97, Word 2000, Word 2002 and Word 2003 for Windows, and is the first version of Word for Mac to make use of the operating system’s Unicode support, including the Unicode keyboards and fonts. It is supplied with a range of Unicode fonts that enable it to display many multi-script documents from Word for Windows. However, it does not support editing of right-to-left scripts (e.g. Arabic and Hebrew) or complex scripts such as Thai and the Indian languages. The Insert Symbol dialog box does not show all fonts or all characters, but Apple’s Character Palette can be used instead. A 30-day trial version of Microsoft Office 2004, which includes Word 2004, is available from Office 2004 Test Drive. Copyright © 2001–2005 Alan Wood |