| 
|  |  | 
 Hi Vladimir, Leif and others
 
 More thoughts on directions for 16bit TeX (\simeq Omega).
 Excuse my being a bit prolix.
 
 As everybody is tributary to standard computers and these
 seem to be drifting toward 16-bit fonts and text files,
 the issue is not to provoke a change, but rather to
 imagine how to derive some profit from a change that is
 inevitable.  Yes, I do believe that 16 bit text will gain
 and become universally accepted at some level that is
 relevant to TeX. Bill Gates is implenting 16 bit text, and
 I imagine he hopes it will become obligatory. I personally
 hope it will just be an interesting option and a basis for
 useful standards.
 
 LaTeX users may well believe that the simplifications
 offered by 16 bit text are quite unneeded. Programmers
 too.  However computer OS designers who dare hope their
 product will be sold with minimal adjustment worldwide
 seem sure to opt for 16 bit text as a common design
 feature. In a couple of decades a majority of computers
 will be sold in the countries like India, China, Japan
 where the need for 16 bits is clear.  TeX is a flea on the
 back of the electronic publishing elephant; bill Gates
 goads the elephant, not the flea.
 
 Some possible desiderata for multilingual TeX typing:
 
 In typing TeX, all Cyrillic characters and punctuation should
 be distinct and disjoint from Latin characters and punctuation.
 
 The ASCII range should be reserved as the language of
 TeX programming.
 
 I'll not risk saying anything revolutionary about math; it can
 certainly continue to be coded as ASCII.
 
 In general, languages with noticeably distinct typographic
 traditions should be disjointly encoded for typing.
 
 Color or style should be be used on screen to distinguish
 languages (and ASCII).
 
 As far as TeX (Omega) is concerned, the typescript should be
 nothing more than a sequence of 16 bit characters.
 
 There should (ultimately) be an TeX exchange format for 16 bit
 TeX typescripts.
 
 Concerning difficulty:
 
 VV> in other words, your approach is practically unusable: it
 > is hard to implement, it is non-flexible (requires changing
 > if one more language needs to be added, or if new font
 > shapes need to be used), and gives unneeded complications to
 > what can be achieved with plain markup commands.
 
 Today yes.  After all, 16 bit screen fonts with languages
 disjointly stacked do not yet exist except in germinal form.
 Nor decent physical typing comforts. However, although the
 approach may seem difficult to you as seen from the tangled
 situation we currently live with, I suspect it is intrinsically
 the simplest way to go multilingual.
 
 
 Concerning hyphenation:
 
 VV> some issues with hyphenation mentioned in my previous email
 > are due to the fact that english and russian have different
 > righthyphenmin values: 3 for english and 2 for russian. if
 > we use a combined russian-english patterns, we have to use
 > some setting like 2 for righthyphenmin (minimal number of
 > letters which are allowed to be cut off the end of the
 > hyphenated word). the ruhyphen package provides some
 > mechanism to make that work in such a way that english
 > words will be hyphenated with 3 letter limit, but that
 > works mostly, but not 100% equivalent to the case when the
 > languages are separated (and language-switching markup
 > commands are used).
 
 The "hyphenmin" problem is real with current TeX so probably it
 is also real with Omega (though I'm not sure). But as you
 explain, the the problem is by now largely solved where
 Russian is concerned. Three more points:
 
 -- even where no special patterns have been made to solve
 this problem one may get good results by using value 3 for
 hyphenmin; just so long as no bad breaks crop up.
 
 -- Omega might someday introduce a 'character class' feature
 in its hyphenation mechanism to solve the hyphenmin problem
 with just a couple of patterns!
 
 -- In rare cases of need (as when bad line breaks crop up)
 one is free to change the hyphenmin values, ad hoc.
 
 Concerning punctuation:
 
 VV> also, there are some typographical issues (russian has
 > frenchspacing and some special typesetting of :;?! signs,
 > etc). so in general it is better to use language switching
 > commands, so that typographical features of all languages
 > will be activated at their full volume (if you don't switch
 > languages, the typographical rules are shared for all
 > languages).
 
 This seems wrong.  An the contrary.  Do not some Russian
 typographers want  "nonfrenchspacing" for Russian, and also
 special national punctuation with national side-bearings
 (different from English)? Today "nonfrenchspacing" is
 inaccessible to them.  Even for French, disjoining ASCII from
 French punction is very desirable today to avoid having
 punctuation of 'active' category when TeX inputs the typing.
 
 In the 16bit world, the Knuthian \sfcode mechanism (used
 character-by-character) has enough flexibility today to impose
 frenchspacing in a language-dependent way. Thus in the stacked
 encoding scheme I am recommending, each language has its own
 punctuation with tailor-made side-bearings, and french spacing
 could even be turned on or off language by language.
 
 > but what encoding would these disjointed files use? it will
 > not be unicode (neiter utf-8 nor utf-16 or whatever),
 > because in unicode all latin letters are jointed.
 
 Somewhere in unicode there are regions reserved for private
 use; why not TUG use?  Standards are a good thing; lets have
 enough of them! Russian might encode on "0700--"07FF and
 French on "2100--"21FF (placement motivated by well known
 international telephone prefixes).  Alternatively, the
 placement of a chunk like T2A in such 16 bit encoding could
 be left indeterminate and rather declared in a header. This
 would avoid TUG agonizing over basicly fatuius placement
 decision and it would also make the 16 bit system adequate
 for any real manuscript without *ever* going to 32 bit TeX.
 
 VV> i'd like to emphasize that the encoding where latin letters
 > are separated for different languages (as well as cyrillic,
 > e.g. for Russian and Ukrainian) has to be non-standard and
 > custom (specific to the languages being used), because all
 > existing character encodings do not support the disjoint
 > latin alphabet for different languages.
 
 Indeed.  Many, many, many languages (not just Russian, and
 possibly a majority) would benefit typographically
 from having their own optimal punctuation -- something that
 only English really has now.
 
 > in your editor you have to use some keystrokes (or mouse
 > clicks) to switch from one "color" (language) to another.
 
 I suggested using function keys to change language.  Then the
 keyboard encoding will change (but not the 16 bit screen font
 encoding); also the the color changes (or maybe the style).
 Most important is that the 16bit text file character sequence
 tells all though a header linking to preexisting codings is
 probably wanted; that text file goes straight to 16bit TeX
 for typesetting.
 
 TeX implementors do not build text editors; they merely
 adapt them.  I hope enough suitable editors will appear in
 the normal course of progress.  On the Mac it seems
 probable that they will appear in the next 3  to 10 years.
 As Leif has emphasized Nisus on Mac seems fertile ground
 for this devellopment. Why not MSWord and WordPerfect?
 
 Finally, let me answer your summarized complaints in a
 summarized way:
 
 VV> home-made encoding
 
 Not really, if assembled from standard pieces like T2A in
 a way specified in a header.
 
 VV> (font coloring) you will use mouse/etc to "color" text
 
 An intrinsicly colored 16bit screen font might be best.
 That seems foolproof.  Then one just changes keyboard
 (physical or virtual) to access different parts of it.
 
 VV> special editor (so
 > not everybody will be able to use the TeX source
 > files written in that home-made encoding)
 
 We have to wait for 16 bit screen fonts and capable editors.
 Incidentally I expect (and hope) that virtual 16bit screen
 fonts will be based on preexisting 8-bit screen fonts. MSWord
 and Nisus and other classical word-processors are remarkably
 close to that.  But I hope to see program editors enter the
 fray (see BBEdit on the mac).
 
 VV> special VF fonts
 
 I hope a future Omega would automatically assemble them
 from preexisting pieces which are national TeX efforts
 like T2A encoded Cyrillic fonts.
 
 VV> mouse/etc to "color" text
 
 No, just a function key to fetch the keyboard of the new
 language; color (or style) changes should be part of the
 screen font.
 
 Let me counter all this negative criticism by suggesting
 that the scheme I am conjuring compares well with preexisting
 competitors for any highly multilingual typescript -- meaning
 one which which cannot be typed and read using a single 8-bit
 font.
 
 One competitor that comes to mind is a classical
 wordprocessor using several text fonts and embedded (La)TeX
 language switches.  One easily goes to TeX via 8-bit text.
 But what about typescript porting and archiving? RTF is a
 poor candidate and others are worse.
 
 Another competitor is perhaps one Vladimir is backing.
 Namely the unicode variant UTF8 with with (La)TeX language
 switches. Real sixteen bit unicode requires Omega. But many
 planes of unicode may be needed ie this not really 16bit...
 but 32bit reencoded to 8bit.  Does that not become tangled
 and slow? As always, typing and screen viewing is a bad
 bottleneck. I would emphasize that while Linux/unix is UTF8
 oriented, Bill Gates prefers unicode.
 
 Just to confuse matters I add that these 3 competitors
 can be cross-bred to beget others...
 
 All three competitors seem worth discussing, and I hope
 that will map the territory ahead for the day when the text
 processing prerequisites do become available. Maybe even in
 time for LaTeX 3?
 
 Cheers
 
 Laurent S.
 
 PS.  On the (La)TeX output side, this devellopment does not
 concern raw fonts with real type1 or bitmapped glyphs
 attached.  Anything new at the font level would be realized
 using virtual fonts.
 
 
 
 |  |