Mailing List CyrTeX-en@vsu.ru Message #204
From: Vladimir Volovich <vvv@vsu.ru>
Subject: Re: Russian/Polish/German...without switching
Date: Sun, 23 Jun 2002 09:07:05 +0400
To: <CyrTeX-en@vsu.ru>
Hi Laurent,

thanks for such a thorough answer. just a quick note for now: you write -

> Another competitor is perhaps one Vladimir is backing.
> Namely the unicode variant UTF8 with with (La)TeX language
> switches. Real sixteen bit unicode requires Omega. But many
> planes of unicode may be needed ie this not really 16bit...
> but 32bit reencoded to 8bit. Does that not become tangled
> and slow? As always, typing and screen viewing is a bad
> bottleneck. I would emphasize that while Linux/unix is UTF8
> oriented, Bill Gates prefers unicode.

you write "real 16-bit unicode" suggesting that UTF-8 is not a "real
unicode", but that is not so. both UTF-8 and UTF-16 (and UTF-32) are
equivalent in that they encode the same set of characters.

UTF-8 is not worse than UTF-16 or UTF-32, and it is a real unicode, and there
exists a one-to-one reencoding for either of these encodings.

Pure TeX can handle UTF-8 in a relatively straightforward way (as the utf-8
and ucs LaTeX packages demonstrate), and the typographical quality achievable
by pure TeX for UTF-8-encoded texts is in most cases not worse than for Omega
(the drawbacks of TeX come from 8-bit fonts and impossibililites of
cross-font kernings and ligatures, but this is rarely needed because most
scripts can be put in one 8-bit font).

Direct processing of UTF-16-encoded texts with TeX may be even possible, but
definitely hard to implement (that's why Omega with it's OTP exists), but one
an always re-encode the text from UTF-16 or UTF-32 to UTF-8 before processing
it with TeX.

Yes, Omega's goals look promising, but unfortunately the development of Omega
is very slow and closed, and, which is more important, there is virtually no
documentation (and there are also compatibility issues -- Omega is far from
being stable).

So having simple support in pure TeX for processing texts in UTF-8 encoding
looks useful (of course the aim of the utf-8 packags is far from being a
complete unicode covering package).

Best,
v.
Subscribe (FEED) Subscribe (DIGEST) Subscribe (INDEX) Unsubscribe Mail to Listmaster