Return-Path: Received: from suncom.rz.hu-berlin.de ([141.20.1.31] verified) by vsu.ru (CommuniGate Pro SMTP 3.5.9) with ESMTP id 855174 for CyrTeX-en@vsu.ru; Wed, 22 May 2002 15:44:56 +0400 Received: from amor.rz.hu-berlin.de (amor.rz.hu-berlin.de [141.20.1.38]) by suncom.rz.hu-berlin.de (8.9.3/8.9.3) with ESMTP id NAA00004 for ; Wed, 22 May 2002 13:44:44 +0200 (MET DST) Received: from localhost (h0444tuv@localhost) by amor.rz.hu-berlin.de (8.8.8+Sun/8.8.8) with ESMTP id NAA00041 for ; Wed, 22 May 2002 13:44:43 +0200 (MET DST) X-Authentication-Warning: amor.rz.hu-berlin.de: h0444tuv owned process doing -bs Date: Wed, 22 May 2002 13:44:43 +0200 (MET DST) From: Ruprecht von Waldenfels To: CyrTeX-en@vsu.ru Subject: Re: Russian/Polish/German in one unicode document without explicit switching? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear Vladimir, well, you're right about that language switching thing - it will allways be needed. BUT if one only needs isolated examples of several languages other than the one the main text is written in, this problem would not be so grave (but would still exist). In the end, one could specify (again in the preamble) which language which script is for. If it is the case that one uses only as many languages as scripts this would solve the problem. An example would be me writing in German, extensively using examples in Russian without switching and incorporating french, polish and english text parts with explicit labeling. I realize, however, that it is not me who will do this, but still: One would only need to find all cyrillic pieces and enclose them with the corresponding tags, i.e., one would need to save which language was active before the cyrillic part and switch back to that language afterwards. I do think this would be a pretty useful thing to have. Unfortunately, I do not know of any book that would enable me to write such a thing. The TeX-book is just too much reading for me at the moment, and something like "tex programming in a nutshell" has not crossed my path yet. Otherwise, I would like to do something like that. Anyway, thanks everyone for the help. I will certainly use Omega sometime coming and report on it; for the time being, I'm stuck to good old explicit labeling, as my text-editor (UltraEdit) somehow does not correctly encode Unicode text (it displays it, but there is no way I can make it let me edit in UniCode). I will need to switch to Emacs-Mule (sometime soon). Cheers Ruprecht On Tue, 21 May 2002, Vladimir Volovich wrote: > Dear Ruprecht, > > "RvW" == Ruprecht von Waldenfels writes: > > RvW> thanks a lot for your advice. I will try it out as soon as > RvW> possible. Can you answer me one question: why is it that there > RvW> is no easy way of implicit switching for fonts. > > The problem exists because of this: > > - TeX's fonts are 8-bit fonts with no more than 256 characters, so > switching fonts is necessary > > - for each character, we must know font encoding from where to get > that character - e.g. if the current font encoding is T2A > (cyrillic), and the next character is Aacute which is unavailable > in T2A, but available in T1, TeX must switch to T1 font encoding. > > I think that it is possible to extend the utf-8 package to do this > switching transparently. > > RvW> In the end, one could imagine a font encoding that encompasses > RvW> all unicode characters - or, more practical, a scheme that would > RvW> generate these encodings out of several specific encodings > RvW> (maybe in the preamble of the document, one would have to > RvW> specify these encodings). > > Omega can deal with 16-bit fonts, and there one can typeset without > switching font encoding. > > But - see my reply to Laurent in CyrTeX-en list - it appears that > LANGUAGE switching markup commands are needed anyway - for good > hyphenation, and then we can put font-switching commands in them. > > So, you have to use additional macros for switching between languages, > and there is no way to avoid that - either in TeX or in Omega or > elsewhere - simply because nobody other than a human being knows which > languages are used in the document. > > RvW> Actually, I really do'nt know much about the tex internals > RvW> concerning such a question, and if your face has turned into a > RvW> frown reading this and you're thinking "For God's sake, how on > RvW> earth am I supposed to explain this to somebody who has just the > RvW> slightest of ideas of what's behind it all?", then please don't > RvW> bother - and I just thought maybe there is a simple reason > RvW> behind it. (Maybe it's the kerning question, for example - it > RvW> would of course be difficult to generate those values, I > RvW> figure.) > > yes, kerning (and ligatures) is one of things which suffers from font > switching. but in most cases, each individual word can be typeset > using one 8-bit font encoding, so if we switch font encoding while > typesetting the first letter of the word, it will not break kerning. > > RvW> By the way, one solution to this problem would be to preprocess > RvW> the file and insert these \cyrcod \latcod commands, finding font > RvW> boundaries by using regular expressions - don't you think this > RvW> would be a solution? I understand there is some tex syntax for > RvW> preprocessing a file. > > you are right, - it is possible to extend utf-8 package to provide > transparent font encoding switching which will hopefully work > sufficiently satisfactory, but - see above - language switching > commands are NECESSARY anyway for good hyphenation, so we can put > encoding switching inside them. > > and that nils the whole idea of improving utf-8 package to > automatically switch font encoding. > > Best, > v. > >