Mailing List CyrTeX-en@vsu.ru Message #186
From: Ruprecht von Waldenfels <h0444tuv@rz.hu-berlin.de>
Subject: Re: Russian/Polish/German in one unicode document without explicit switching?
Date: Wed, 22 May 2002 13:44:43 +0200 (MET DST)
To: <CyrTeX-en@vsu.ru>
Dear Vladimir,

well, you're right about that language switching thing - it will allways
be needed. BUT if one only needs isolated examples of several languages
other than the one the main text is written in, this problem would not be
so grave (but would still exist).

In the end, one could specify (again in the preamble) which
language which script is for. If it is the case that one uses only as
many languages as scripts this would solve the problem. An example would
be me writing in German, extensively using examples in Russian without
switching and incorporating french, polish and english text parts with
explicit labeling.

I realize, however, that it is not me who will do this, but still: One
would only need to find all cyrillic pieces and enclose them with the
corresponding tags, i.e., one would need to save which language was
active before the cyrillic part and switch back to that language
afterwards. I do think this would be a pretty useful thing to have.

Unfortunately, I do not know of any book that would enable me to write
such a thing. The TeX-book is just too much reading for me at the moment,
and something like "tex programming in a nutshell" has not crossed my
path yet. Otherwise, I would like to do something like that.

Anyway, thanks everyone for the help. I will certainly use Omega sometime
coming and report on it; for the time being, I'm stuck to good old
explicit labeling, as my text-editor (UltraEdit) somehow does not
correctly encode Unicode text (it displays it, but there is no way I can
make it let me edit in UniCode). I will need to switch to Emacs-Mule
(sometime soon).

Cheers
Ruprecht

On Tue, 21 May 2002, Vladimir Volovich wrote:

> Dear Ruprecht,
>
> "RvW" == Ruprecht von Waldenfels writes:
>
>  RvW> thanks a lot for your advice. I will try it out as soon as
>  RvW> possible.  Can you answer me one question: why is it that there
>  RvW> is no easy way of implicit switching for fonts.
>
> The problem exists because of this:
>
>   - TeX's fonts are 8-bit fonts with no more than 256 characters, so
>     switching fonts is necessary
>
>   - for each character, we must know font encoding from where to get
>     that character - e.g. if the current font encoding is T2A
>     (cyrillic), and the next character is Aacute which is unavailable
>     in T2A, but available in T1, TeX must switch to T1 font encoding.
>
> I think that it is possible to extend the utf-8 package to do this
> switching transparently.
>
>  RvW> In the end, one could imagine a font encoding that encompasses
>  RvW> all unicode characters - or, more practical, a scheme that would
>  RvW> generate these encodings out of several specific encodings
>  RvW> (maybe in the preamble of the document, one would have to
>  RvW> specify these encodings).
>
> Omega can deal with 16-bit fonts, and there one can typeset without
> switching font encoding.
>
> But - see my reply to Laurent in CyrTeX-en list - it appears that
> LANGUAGE switching markup commands are needed anyway - for good
> hyphenation, and then we can put font-switching commands in them.
>
> So, you have to use additional macros for switching between languages,
> and there is no way to avoid that - either in TeX or in Omega or
> elsewhere - simply because nobody other than a human being knows which
> languages are used in the document.
>
>  RvW> Actually, I really do'nt know much about the tex internals
>  RvW> concerning such a question, and if your face has turned into a
>  RvW> frown reading this and you're thinking "For God's sake, how on
>  RvW> earth am I supposed to explain this to somebody who has just the
>  RvW> slightest of ideas of what's behind it all?", then please don't
>  RvW> bother - and I just thought maybe there is a simple reason
>  RvW> behind it.  (Maybe it's the kerning question, for example - it
>  RvW> would of course be difficult to generate those values, I
>  RvW> figure.)
>
> yes, kerning (and ligatures) is one of things which suffers from font
> switching. but in most cases, each individual word can be typeset
> using one 8-bit font encoding, so if we switch font encoding while
> typesetting the first letter of the word, it will not break kerning.
>
>  RvW> By the way, one solution to this problem would be to preprocess
>  RvW> the file and insert these \cyrcod \latcod commands, finding font
>  RvW> boundaries by using regular expressions - don't you think this
>  RvW> would be a solution? I understand there is some tex syntax for
>  RvW> preprocessing a file.
>
> you are right, - it is possible to extend utf-8 package to provide
> transparent font encoding switching which will hopefully work
> sufficiently satisfactory, but - see above - language switching
> commands are NECESSARY anyway for good hyphenation, so we can put
> encoding switching inside them.
>
> and that nils the whole idea of improving utf-8 package to
> automatically switch font encoding.
>
> Best,
> v.
>
>


Subscribe (FEED) Subscribe (DIGEST) Subscribe (INDEX) Unsubscribe Mail to Listmaster